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Personality Self-Assessment of Scientific and Technical Personnel 


R. H. Van Zelst 


Kroh-Wagner Company 


and 


W. A. Kerr 


Illinoi 


The reality of personality existence is 
multiple: there are the selves perceived by 
“ (compli- 
Then, 
too, there is the “paper-and-pencil self,” the 
“projective self,” the “under stress self,” and 
other externally assessed selves. These selves 
tend to be unlike each other even for the 
same person because they exist within differ- 
ent frames of reference. 

It seems a reasonable hypothesis that the 
individual in normal society who is best in- 
formed about an individual’s personality is 
that same individual himself. Further, it ap- 
pears plausible that many traits of his per- 
sonality can be self-assessed with substantial 


validity (1, 4, 5). 


associates, and there are “selves” 


cated self) as perceived by the self. 


Method 


Rationale. For at least three decades both 
psychology and psychiatry have been preoc- 
cupied with external assessment of person- 
ality. The validity coefficients culminating 
from these years of effort have been less than 
gratifying. In fact, for predicting such a cri- 
terion as job success they are characteristi- 
cally near zero or non-existent (2, 4, 6). 
This accumulated experience might now well 
provoke the researcher to ask——“Who is likely 
to know the most about a given personality? 
Can a personality assess itself?” Have we 
overlooked the obvious? 

Conventional paper-and-pencil personality 
tests attempt to infer a trait by measurement 
of operational symptoms assumed to be func- 
tions of the trait. Although no measurement 


Institute of Technology 


ever is direct, this approach introduces the 
obscuring influence of such intervening vari- 
ables as shaky assumptions about symptoms 
and the poor reliability of symptom-type 
items based even on relatively sound symp- 
tom assumptions. 

The directive clinical assessment approach 
compounds the errors of the conventional test 
approach by, in effect, introducing two addi- 
tional intervening variables: (1) the person- 
ality of the clinician; and (2) the limited 
knowledge of the client possessed by the 
clinician. 

These limitations of traditional assessment 
seem further to suggest rewarding validity 
in a self-assessment approach. The present 
study is based also on the assumptions that 
personality assessment is most valid when: 
(1) the emphasis is metric rather than im- 
pressionistic; and (2) trait 
maximized while verbalization 

Subjects. Subjects of this study were 514 
technical and scientific personnel of the 
Armour Research Foundation (790%) and the 
Illinois Institute of Technology (21%). Their 
mean age was 31.9, standard deviation 9.1. 

Procedure. A 
was constructed on the basis of Cattell’s re- 
search (3) which lists definitive personality 
trait names based on factor and cluster analy- 
sis techniques. From this list of traits 56 
trait names were selected. 

Each subject was guaranteed anonymity 
and was asked: “Please rate yourself as com 
pared with fellow scientists on the following 
traits utilizing the five point scale as follows: 


concepts are 
is minimized. 


self-analysis questionnaire 
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as compared with other scientists, I prob- 
ably am 1. much less; 2. less; 3. same; 4. 
more; 5. much more’ acquisitive, ambitious, 
etc. Each respondent also supplied age 
(nearest in five-year multiples), number of 
publications, number of inventions, and field 
of work. Of the subjects responding, 70% 
were in the field of Engineering and 30% in 
Physical Sciences. 

Criterion. The criterion against which these 
self-assessed traits were evaluated was the 
summation of publications and inventions for 
each respondent. In other 
terion was scientific productivity. 


words the cri- 
The influ- 
ence of age was held constant by means of 
partial correlation techniques. Mean produc- 
tivity for the group was 11.4 with a standard 
deviation of 19.1. 


R. H. Van Zelst and W. A. Kerr 


Results 


Of the 667 questionnaires distributed (via 
campus mail with explanatory letters and re- 
turn envelopes addressed to “Technical Per- 
sonnel Research”) a total of 514 (77%) 
were returned in usable form. 

These 514 self-ratings on each of 56 traits 
were then correlated (Pearsonian) with the 
productivity (inventions plus publications) 
criterion. A second series of coefficients was 
then computed using the partial method on 
the original coefficients and holding constant 
the effect of age. Both series are shown in 
Table 1. 

The original hypothesis that the self-rating 
approach may yield more significant validity 
coefficients than the traditional external- 
evaluation approach seems to be verified. 


Table 1 


Pearsonian Correlation Coeflicients between Productivity and Personality Trait and between 
Productivity with Age Held Constant and Personality Trait * 


Trait 


Acquisitive 
Ambitious 
Argumentative 
Assertive 
‘autious 
‘onscientious 
‘onstructive 
‘ontented 
‘onventional 
‘ooperative 


‘urious 


casygoing 

‘ccentric 

“gotistical 13 
emotional ; 03 
“nthusiastic a JY 


“vasive —, — 2] 


( 
( 
( 
( 
( 
( 
( 
Cynical 
I 
I 
I 
I 
I 
I 
I 


excitable 

Fastidious SO 
Formal a 32 
Frank 13 A7 
Friendly 06 OS 
Generous lt 4 | 
Grateful 07 09 
Habit-bound 03 O4 
Headstrong 1A we} 
Hurried 07 09 


* Italicized coeflicients significant at 1°% level. 


Trait 
Imaginative 
Impulsive 
Independent 
Inflexible 
Inhibited 
Interests-wide 
Introspective 
Leading 
Love-work 
Optirnistic 
Original 
Patient 
Painstaking 
Persevering 
Poised 
Practical 
Reliable 
Reserved 
Responsible 
Self-Confident 
Self-Controlled 
Sensitive 
Serious 
Subjective 
Suggestible 
Tactful 
Thoughtful 
Worrying 





Personality Self-Assessment 


This statement is based on the finding of 37 
trait coefficients significant at the 1 per cent 
level, plus an additional three at the 5 per 
cent level. This represents 74 per cent of 
the coefficients at a non-chance level, a pro- 
portion rarely found in external evaluation. 
Sixteen (30 per cent) of the coefficients are 
of magnitude .30 or higher. 

These more promising coefficients, ranging 
to .61, present an interesting self-picture of 
the highly productive scientists in this study. 
As a personality group these high producers 
describe themselves as being original (.61), 
not contented (— .57), not conventional 
(— .53), imaginative (.41), curious (.40), 
enthusiastic (.39), and impulsive (.39). 

Also characteristic, but to a lesser degree, 
of these highly productive personnel are self- 
descriptions of self-confident (.35), leading 
(.34), not worrying (— .34), not inhibited 
(— .33), not formal (— .32), loves work 
(.32), subjective (.31), fastidious (.30), and 
not acquisitive (— .30). 

To the extent that these scientists are rep- 
resentative, the more productive scientist ap- 
pears to describe himself as an enthusiastic 
and impulsive personality which is original, 
imaginatively non-conformist, not contented 
with reality as it is, curious as to the nature 
of this reality, and not fundamentally acquisi- 
tive in a selfish sense. The less productive 
scientists in this study possess an opposite 
self-description pattern. 

This total pattern does not necessarily 
agree with the popular conception of the pro- 
ductive scientist. In this sample he is, for 
example, more subjective than objective in 
personality orientation; but probably this 
makes for the greater introspection which 
may be necessary for better and more origi- 
nal exploration and interpretation of the un- 
known. And our highly productive scientist 
is not characteristically cautious or inhibited; 
he is less cautious, more self-confident, and 
more impulsive than his less productive asso- 
ciates. Nor does he lurk modestly in his 
laboratory; he engages freely in leading be- 
havior. 

These results are with 


consistent some 


previous research on a related population (7, 
8), particularly in suggesting relative selfless- 
ness of motive as a significant trait in highly 
productive. scientists. 


Summary 

A total of 514 technical and scientific per- 
sonnel of the Armour Research Foundation 
and the Illinois Institute of Technology co- 
operated in an anonymous self-administered 
self-description report on 56 definitive per- 
sonality trait names. These self-ratings were 
correlated with a criterion (inventions plus 
publications) of scientific productivity, hold- 
ing constant the effect of age by partial cor- 
relation. 

1. The original hypothesis that a self-rat- 
ing approach to personality evaluation may 
yield results of greater validity than ordi- 
narily found in the external evaluation ap- 
proach is not refuted and even appears to be 
somewhat substantiated. Sixty-eight per cent 
of the validity coefficients exceed chance 
magnitude at the 1 per cent level. 

2. As compared with the less productive, 
the more productive scientists in this study 
described themselves as more original, less 
contented, less conventional, more imagina- 
tive, more curious, more enthusiastic, more 
impulsive, and, somewhat less definitely, more 
self-confident, more leading, less worrying, 
less inhibited, less formal, more liking for 
work, more subjective, more fastidious, and 
less acquisitive. 

Received July 28, 1953. 
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Personality Correlates of Social Conformity 


Raymond E. Bernberg 


Los Angeles State College 


The author has recently introduced a scale 
(2) which is presumed to measure social con- 
formity. Social conformity is defined as the 
tendencies of members of a society to mani- 
fest communality of attitudes and of behavior 
as a result of the restrictive influences of cul- 
ture and society in personality development. 

The scale utilizes the direction of percep- 
tion technique of attitude measurement (1). 
It is a projective-type paper-and-pencil test. 
The content of the items of the scale was 
drawn from the following determinant areas 
of social conformity: (1.) moral values; (2.) 
positive goals; (3.) reality testing; (4.) 
ability to give affection; (5.) tension level; 
and (6.) impulsivity. 

Examples of the type of items are: 

Statistics show what percentage of men like 

to write things on the walls in men’s 
rooms? 

(a) 27% 

(e) 70% 


(b) 40% (c) 53% (d) 66% 


Public Opinion Polls show what percentage 
of people feel it is silly to make close 
friendships because few people can really 
understand you? 


(a) 30% (b) 40% 
(e) 70% 


(c) 50°, (d) 60% 

The scoring of the 37 items of the scale is 
based upon a weighted key determined by 
previous experimentation (2). Validation of 
the scale was determined by behavioral cri- 


teria (2). The criterion groups used were 
adult male and female prison inmates; young 
male prison inmates; and regular white Prot- 
estant church-going groups; other groups 
used were college populations of all ages and 
both sexes and police officers of the Los An- 
geles Environs. 

The scale is an attempt at approaching a 
dimension of personality from a different level 
than is usual in most personality tests. It 
attempts to measure an aspect of personality 
organization as reflected in attitudes derived 
from cultural and societal influences. In ad- 
dition, it is an indirect method of attitude 
measurement. 

The problem with which we are concerned 
is: What relationships exist between this 
scale and other direct measures of person- 
ality? 

Procedure 

Subjects. The subjects utilized to relate 
the measure of social conformity (SC) to 
other differing personality measures are 89 
female social welfare case workers and super- 
visors from the Los Angeles County Bureau 
of Public Assistance. They extend broadly 
as to age and work experience. 

Method. ‘The subjects were administered 
the SC scale and the Guilford-Zimmerman 
Temperament Survey (GZ) (4). This latter 
personality scale is a direct method of item 
questioning. The scale is broken down into 
ten traits which were derived by factor-analy- 


Table 1 
Mean Scores and Sigmas of the Social Welfare Case Worker Group (N = 89) Compared to the 
GZ Women Norms and a Standard Population Group on SC 


Social 
Woxkers 


Normative M 


Groupst S.D. 


19.6 


Traits 
S S o* 
19.9 19.7 


Ie 5.2 


16.8 
5.4 


6.3 


* Signif. Diff. .001 level between groups on these traits. 
t N was 136 for Trait T, 300 for SC, and 389 for the remaining traits 
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Personality Correlates of Social Conformity 


Table 2 


Intercorrelation Matrix of the GZ Traits and SC 
(Pearson Product-Moment Coefficients 


G 
R 


A 


18 
05 
i ; P — O68 
P - 23 
M - A 8 06 29 
sc , 21 —.19 


Ss , » 

| ; 2. 21 
( 

I 


sis procedures. They are: (1.) General ac- 
tivity (G); (2.) Restraint (R); (3.) Ascend- 
ance (A); (4.) Sociability (S); (5.) Emo- 
tional stability (FE); (6.) Objectivity (O); 
(7.) Friendliness (F); (8.) Thoughtfulness 
(T); (9.) Personal relations (P); and (10.) 
Masculinity (M). 

A high score on any trait of the GZ indi- 
cates a “positive” quality. A high score on 
the SC indicates a socially undesirable de- 
gree of nonconformity. 


Results 

Table 1 indicates the mean scores and sig- 
mas of the sample obtained in this study com- 
pared to the GZ statistics for the women (4). 
In the case of the SC data, the sample is com- 
pared to statistics derived from a standard 
population (2). The social welfare case 
workers appear to be higher on the R, E, and 
O traits and similar to the normative popula- 
tion on all others. 

Table 2 presents a matrix on the intercor- 
relations of the GZ traits and SC. The in- 
tercorrelations of the GZ traits derived from 
this study show a considerable amount of 
variable difference when compared to the GZ 
data (3) which are based upon male sub- 
jects and are tetrachoric coefficients. How- 
ever, one must realize that both samples are 
highly select. This would indicate the pos- 
sible high degree of probable variation in re- 
sults one would expect of the GZ scale in 
terms of differential descriptive characteristics 
of a population 
“factor exhaus- 
was used to find what factors or 


The Gengerelli method of 
tion” (3) 


1, 
18 
27 
A7 
35 


13 


—_ 


~ 03 
16 
10 2 —.15 


Barb wn 
“N= Nw 


subtests of the GZ scale provide maximum 
prediction for the SC measure but no signifi- 
cant increment in prediction in addition to 
that between O and SC was obtained. 


Summary 


1. A group of 89 female social welfare case 
workers and supervisors were administered 
the Guilford-Zimmerman Temperament Sur- 
vey and a social conformity scale. The pur- 
pose of the study was to find personality cor- 
relates of social conformity. 

2. The sample obtained in this study was 
highly select and differed somewhat from a 
normative female population on the GZ scale, 
being significantly higher on restraint, emo- 
tional stability, and objectivity. The inter- 
correlations between the traits for this sam- 
ple also differed considerably from those pre- 
sented by the authors of the GZ scale. 

3. The relationship between the two scales 
appears to be limited to the — .47 correlation 
between the Objectivity factor on the GZ and 
social conformity. 


Received July 2, 1953 
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Peer Nominations on Leadership as a Predictor of the Pass-Fail 
Criterion in Naval Air Training * 


E. P. Hollander *+ 
U. 8. Naval School of Aviation Medicine, Pensacola, Florida 


Studies by Williams and Leavitt (5) at the 
Marine Corps Officer Candidate School, by 
Wherry and Fryer (4) at the Signal Corps 
Officer Candidate School, and, more recently, 
by McClure, Tupes, and Dailey (3) at the 
Air Force Officer Candidate School have lent 
substantiation to the validity of peer nomina- 
tions on leadership against various perform- 
ance and operational criteria. 

Summing up their findings, based on a fac- 
tor analysis, Wherry and Fryer conclude that 
“Buddy ratings appear to be the purest meas- 
ure of ‘leadership.’ » Nominations by class 
appear to be better measures of the leadership 
factor than any other variable” (4, p. 157). 
Williams and Leavitt note that “. . . socio- 
metric group opinion was a more valid pre- 
dictor both of success in Officer Candidate 
School and of combat performance than sev- 
eral objective tests” (5, p. 291). They con- 
clude that the relative superiority of group 
opinion is attributable to the fact that “group 
members have more time to observe each 
other than do superior officers, they know 
each other in a realistic social context, and 
they react directly to each other’s social- 
dominance behavior. All these are condi- 
tions favorable to informed judgment” (5, 
p. 291). 

In addition to adequately fulfilling condi- 
tions of validity, the nominating technique 
has been found to meet acceptable standards 
of reliability. Thus, Wherry and Fryer re- 
port that “. . . the reliability of nominations 
after four months is outstandingly higher 
than that of any of the other variables upon 
which the test was made. This is probably 


* Now in the Department of Psychology, 
Institute of Technology. 

+ Grateful acknowledgment is extended to E. R 
Sausser, Jr. for his valuable aid in the pursuance of 
this study. 

1Qpinions or conclusions contained in this report 
are those of the author. They are not to be con- 
strued as necessarily reffecting the vicw or the en- 
dorsement of the Navy Department. 


Carnegie 


further evidence of the fact that the nomi- 
nating technique has the property of early 
identification of the members of the group 
who constitute the two extremes of the lead- 
ership distribution” (4, p. 159). In a recent 
evaluation of peer ratings among Marine 
Corps trainees, an average reliability coeffi- 
cient over a two-week period of .71 is re- 
ported by Anderhalter et al. (1, p. 26). 


Problem 


The evidence supporting the validity of the 
peer nomination technique is clear-cut. Here- 
tofore, however, the criteria utilized for vali- 
dation have quite properly tended to be di- 
rectly related to the initial character of the 
nomination. It has been assumed, with good 
reason, that peer nominations on leadership 
should be expected to correlate with a cri- 
terion derived from some variety of leader- 
ship behavior or performance measure. On 
the other hand, there exists very little re- 
search regarding the applicability of peer 
nominations on leadership to performance or 
operational criteria presumably unrelated to 
leadership behavior. It may well be that the 
so-called “leadership nominations” identify 
characteristics of the individual which relate 
to criteria in the spheres of cognition, or per- 
sonal adjustment, or such a complex as ability 
to successfully solo an aircraft. With this 
prospect in view, the current investigation set 
forth to explore a fundamental relationship, 
that is, peer nominations on leadership, dur- 
ing pre-flight school, and success or failure 
through the whole of flight training. Funda- 
mentally, two questions were posed: Do peer 
nominations on leadership during pre-flight 
correlate significantly with a pass-fail cri- 
terion ne the entire flight training program? 
And, , how well do these nominations 
aie this criterion compared to other vari- 
ables from the same stage of 
pre-flight ? 


training, i.e., 
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Procedure 


A total of 268 Naval Aviation Cadets who en- 
tered pre-flight training during late 1951 were 
taken as a study sample. This group consisted 
of nine consecutively formed “sections” of about 
thirty cadets each. The cadets had already been 
preselected for the training program on criteria 
of physical fitness, age, minimum educational 
level, intelligence, mechanical aptitude, and back- 
ground characteristics. 

At the end of their third month of pre-flight 
training, each section was administered a leader- 
ship nomination form which presented the indi- 
vidual cadet with a list of his sectionmates from 
which he was asked to nominate the three men 
from the list best qualified for the hypothetical 
position of “student commander” and the three 
men Jeast qualified. Furthermore, the instruc- 
tions specifically stated that the nominator was 
to evaluate his nominees with regard to their 
“present and eventual success as military lead- 
ers.” In this way, it was anticipated that con- 
fusion regarding the “leadership standard” to be 
applied would be obviated. It should be noted, 
too, that cadets were directed to ignore athletic 
ability as a factor in their nominations. This 
was considered to be a necessary and desirable 
part of the set in order to place some control on 
an ability which seemed likely to be closely re- 
lated to physique. Nominations were weighted 
+3 for “highest,” +2 for second highest, and 
+ 1 for third highest; similarly, weights of — 3, 
—2 and —1 were assigned for the three cor- 
responding “low” categories. A summation of 
these weights for each cadet was then taken as 
his leadership nomination score. ‘The distribu- 
tion of such scores yielded a unimodal and ap- 
proximately symmetrical distribution. A stand- 
ard score transformation was then utilized to af- 
ford a comparable index of the cadet’s relative 
standing on leadership within his own section.* 

In addition to this leadership score (LDR) de- 
rived from peer nominations, a number of other 
measures on the cadets were available from pre- 
flight. These were: ACE (College Level) Test 
scores obtained during the cadet’s first week in 
training; Officer-Like-Qualities score (OLQ) as- 
signed at the end of pre-flight by the officers in 
command to evaluate the cadet on qualities of 
leadership, military bearing, discipline and the 
like; and final pre-flight average (F.AV.) based 
upon performance in all courses.* 

* This technique derives substantially from one de- 
veloped as part of ONR Contract No. N onr—o-3400 
by Richardson, Bellows, Henry and Company, 

*It should be noted that at no time were scores 
achieved by cadets on peer nominations available to 
authorities in the Training Command who assign 
OLQ or performance grades to cadets. Moreover, 
the cadets themselves did not know the ACE scores, 
final grades, or OLQ grades of their sectionmates at 
the time they made their nominations on leadership. 


After a period of some eighteen months had 
elapsed, a follow-up of the study sample revealed 
that of the 268 cadets involved, 179 had passed 
flight training and had received their wings, 32 
had failed flight training, 28 had withdrawn from 
training voluntarily, and the balance, 29 cadets, 
had been separated from the training program 
as a result of physical disqualification, illness, 
violation of contract, or some similar reason. 
With criterion groups thus established, a matrix 
of intercorrelations among the predictor variables 
was constructed and biserial r’s were computed 
for each of these variables against the pass-fail 
criterion. 

Findings 


Table 1 presents the matrix of intercorrela- 
tions, validity coefficients, and beta weights 
for the four predictor variables. Among 
these, it is apparent that final pre-flight av- 
erage (F.AV.) predicts the pass-fail criterion 
at the highest relative jievel and with the 
greatest weight. This tends to reinforce the 
finding of an earlier study on_ pre-flight 
grades as predictors of flight performance 
(2). Second to final average, however, is the 
leadership score (LDR) which the cadet re- 
ceived from the nominations made by his 
sectionmates before he entered the flight 
phase of training, and well over a year prior 
to the time he might receive his wings. It 
should be noted, too, that the magnitude of 
the difference between the validity coefficients 
for F.AV. and LDR may readily be ascribed 
to chance fluctuations. Superiors’ ratings on 
qualities related to leadership (OLQ) yield a 
validity coefficient which is positive but non- 
significant statistically; its beta weight is of 
a relatively low order as well. Scores on the 
ACE Test appear to have decidedly limited 
predictive value against the criterion. On 
the whole, then, the validity coefficients and 
beta weights for final pre-flight average and 
peer nominations on leadership suggest that 
these two variables are of greatest relative 
validity among those considered from the pre- 
flight level of training. The multiple R ob- 
tained for these variables was calculated to 
be .33. 


Discussion 


Considering the highly select nature of the 
population from which the sample was drawn, 
the complexity of the criterion applied, and the 
time differential between the predictor and cri- 
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Table 1 


Intercorrelations, Validity Coefficients, and Beta Weights for Four Predictor Variables from Pre-Flight 
Against a Pass-Fail Criterion from Flight Training t 
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terion variables, the multiple of .33 takes on 
stature. The fact, too, that under these condi- 
tions peer nominations on leadership should pre- 
dict the criterion is still more surprising. While 
a coefficient of .27, accounting for approximately 
7% of the variance, is not striking by itself, in 
relative terms it suggests that peer nominations 
at an early level of training may account for 
unique variance in predicting the criterion. A 
number of hypotheses are entertained below in 
an attempt to derive meaning from the obtained 
relationship. 

In the first place, it may be asserted as a rea- 
sonable assumption that peer nominations on 
leadership reflect a cadet’s social acceptance 
within the cadet group. Hence, those cadets 
who are low on leadership are apt to be social 
isolates as well. Their assimilation within cadet 
groups may be limited and their probability of 
successful completion of the total program may 
be diminished correspondingly. If it is further 
assumed, however, that such individuals are as 
likely to withdraw from training as they are to 
fail, it should follow that a validity coefficient 
for peer nominations taken against a pass-with- 
draw criterion should yield an approximation to 
the coefficient secured with the pass-fail criterion. 
A test of this hypothesis, by actual computation 
of this coefficient, vielded an r of .07; the hy- 
pothesis was accordingly rejected. 

This leads to the consideration that perhaps a 
record of inadequate achievement at pre-flight, 
by the then potential failures, is of significance 
in determining their leadership scores. The cor- 
relation of .50 between final pre-flight average 
and peer nominations lends credence to the 
ascription of influence by the former variable on 
the latter. While this hypothesis is basically 
sound, it does not completely or satisfactorily 


All validity coefficients have an N of 211 


speak to the question posed because of the weight 
which peer nominations achieve independently of 
final average. 

Another point of departure, from a somewhat 
different frame of reference, is that the kind of 
person who assimilates well in cadet groups— 
and who may consequently be expected to secure 
leadership status—may be the same kind of per- 
son who is reacted to favorably by instructors in 
flight training. This influence may be particu- 
larly felt when a cadet is in difficulty and is pre- 
sented before a board of officers to determine 
whether he is to be failed from flight training. 
Should he impress the board favorably by cer- 
tain subtle interpersonal mechanisms, he may un- 
intentionally be accorded a more sympathetic 
hearing than will others. Whether before a 
board such as this or on the flight line, it seems 
probable that there is some weight introduced in 
favor of the more verbally fluent and socially 
facile individual. It is quite conceivable, there- 
fore, that the obtained predictive quality of peer 
nominations might be accounted for in terms of 
some pervasive value through training of social 
characteristics such as these. 

From this discussion, the points made will be 
seen to fall within two categories of conjecture 
first, that the complex “leadership qualities,” as 
defined by peer nominations, subsumes individual 
characteristics which are intrinsically related to 
the successful completion of flight training; 
second, that peer nominations tap a facet of the 
individual which is also perceived and reacted to 
by those who evaluate his performance in flight 
training. These categories certainly need not be 
conceived of as mutually exclusive of one an- 
other. 

Whatever factors may be found to underlie the 
relationship between peer nominations on leader- 
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ship and the pass-fail criterion from flight train- 
ing, it is fundamentally true that neither variable 
is of a simple, unidimensional structure. In 
order, therefore, to distill out their commonality 
in meaningful psychological terms, further re- 
search is indicated. It would appear reasonable 
to consider that the first step in such a direction 
should be to have nominators verbalize the cri- 
teria by which they make their judgments of 
leadership. Beyond this, it would also be de- 
sirable to undertake full-scale research with a 
peer nomination form specific to the nominator’s 
estimate of the nominee’s potential for success- 
ful completion of flight training. 

In any event, it seems likely that the peer 
nomination technique may have utility far ex- 
ceeding current practice or expectation. The 
“informed judgment” of group opinion might 
well be profitably exploited further. 


Summary 


A study was conducted to determine the re- 
lationship between peer nominations on lead- 
ership during pre-flight and a pass-fail cri- 
terion from Naval Air Training. At the end 
of three months of pre-flight training, nine 
sections of Naval Aviation Cadets, a sample 
of 268 cases, were asked to nominate mem- 
bers of their section as best or least qualified 
for a military leadership position. Leader- 
ship scores were derived for each cadet. Three 
other scores were also obtained for the cadets 
from the pre-flight level of training: ACE 
Test; Officer-Like-Qualities grade (OLQ), as- 
signed by officers in charge; and final over- 
all pre-flight average (F.AV.).  Biserial r’s 
were computed for each of these variables 
against pass-fail criterion data from flight 
training. Appropriate beta weights were also 
derived and a multiple R calculated. 

The findings of this study were these: 

1. Peer nominations on leadership (LDR) 
predicted the pass-fail flight criterion at a 
significant level (r = .27). 

2. However, final pre-flight average (r 
.28) was of virtually equal value as a pre- 
dictor. 


153 


3. Neither OLQ (7 = .18) or ACE Test 
(r = .07) predicted the flight criterion sig- 
nificantly. 

4. The multiple R for these four predictor 
variables against the criterion was .33. The 
beta weights obtained indicated that LDR 
and F.AV. were bearing the load of prediction. 

It was concluded that peer nominations on 
leadership, at the pre-flight level, might hold 
unique variance in predicting the pass-fail 
flight criterion. This was tentatively held to 
be attributable to the dual considerations 
that: first, peer nominations might subsume 
characteristics intrinsically related to success 
in flight training and, second, that peer nomi- 
nations might tap a facet of the individual 
which is also perceived and reacted to by 
those evaluate flight 
training. Some implications for subsequent 
research were delineated with the suggestion 
that this technique be applied further. 


who performance in 
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The Retest Consistency of Army Alpha after Thirty Years * 


William A. Owens, Jr. 
The lowa State College 


Personnel decisions are made every day 
which imply the long-term consistency of re- 
sults obtained from group tests of intelligence. 
It is, however, relatively rare to be able to re- 
test a group of adults with the identical meas- 
uring instrument originally employed and 
over a period of time equal to only a little 
less than one-half the average life span. The 
present paper is therefore devoted to a brief 
description of some results obtained under 
these conditions. 

Table 1 
Test-Retest Correlations of Army Alpha Subtests and 


Total Score after Thirty Years for 127 lowa 
State College Freshmen Men 


Subtests Fane Parsi 


1. Following directions 30 49 
2. Arithmetical reasoning j aT 
3. Practical judgment 56 
4. Opposites 64 93 
5. Disarranged sentences A8 87 
6. Number series completion 62 
7. Analogies 56 
8. Information ; 73 


Total Score 77 97 


The data were gathered in connection with 
an investigation of the effects of age upon 
mental abilities; this research has been re- 
ported in detail elsewhere by Owens (2). In 
this context, 127 males of mean age nineteen 
years who had originally taken Army Alpha, 

*The basic data upon which this brief note is 
based were gathered under the terms of a contract 
grant from The Office of Naval Research; however, 
the opinions and assertions contained herein are those 
of the writer and are not to be construed as official 
or reflecting the views of the Navy Department or 
the naval service at large. 


Form 6, as freshmen at Iowa State College 
during early 1919 were retested with identical 
copies of this same examination during 1950. 

The results of this testing and retesting 
have been incorporated in the two tables 
which follow. Table 1 shows the correlations 
for each subtest. Table 2 shows a condensed 
scatter table for Total Score. Taken together 
they indicate rather remarkable consistency, 
since the range of talent in a college popula- 
tion largely composed of graduates ' is surely 
greatly restricted, and since the basic study 
previously mentioned suggests that the thirty- 
year age increment involved did affect indi- 
viduals differentially. In each instance the 
retest coefficients * (7) may be compared 
with corrected odd-even coefficients for the 
1919 testing (r..); and, since these latter are 
considerably inflated by undue speeding in 
at least three instances, Gulliksen (1) lower 
limit estimates (formula 24) (7;;) are also 
included where appropriate. 

If a conclusion is warranted, it would seem 
to be that personnel decisions posited upon 
the long-term consistency of results obtained 
from our better intellective tests are reason- 
ably well founded. 

Received June 26, 1953. 
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tests. New 


! Mean education is 4 to 5 years beyond high 
SC hool 

* All stated magnitudes are product-moment cor- 
relations computed from normalized standard scores 
and not from the grouped raw scores as plotted. 


Table 2 


1919 vs. 1950 Army Alpha Scores: Total Score for 127 Iowa State College Freshmen Men 


1950 Score 


1919 
Score 
175-200 
150-174 
125-149 
100-124 
75-99 
50-74 


Total 


100-124 


125-149 150-174 = 175-200 Total 


1 5 4 
14 | 15 31 
26 4 


‘ 
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Reliability and Validity of the Kopas Personnel Test Battery 


Philip Ash 
Inland Steel Company, East Chicago, Indiana 


The Kopas Personnel Test Battery (3) 
includes seven tests which purport to meas- 
ure: (A) Ability to Think in Mechanical 
Terms; (B) Knowledge of Math and Science; 
(C) Preference for Non-Routine Work; (D) 
Ability to Get Along with People; (E) Emo- 
tional Stability; (F) Ambition; and (X) 
Manual Coordination. A novel characteristic 
of the tests (except the Manual Coordination 
Test) is their method of administration and 
scoring. The test questions are printed on 
cards mounted on panels. The answers to 
each question are printed around a dial, and 
the examinee indicates his choice by turning 
a pointer to the proper point of the dial. 
The tests are scored by noting the position 
of the pointers on the back of the board 
where the correct dial positions are marked. 
The device eliminates answer forms and 
“pencil and paper” administration. It also 
fails to provide a permament record of item 
responses. Furthermore, the equipment ne- 
cessitates individual administration. 

Some of these tests are novel in intention 
and definitions, although it is not clear that 
the items in them (e.g., Ambition, Emotional 
Stability) measure what the names claim 
they measure. 

Neither the Test Manual (3) nor a mimeo- 
graphed bulletin by the author on personnel 
testing (4) includes any reliability data, and 
very sparse validity information. Only one 
published study, on the test’s reliability (1), 
seems to have appeared. In it, Baxter re- 
ported that “correspondence with companies 
reported as users of the tests revealed little 
conclusive data on validity and no data on 
reliability.” 

The bulletin on the tests (4) suggests that 
the original validation was based on a sample 
of 480 employed workers, but no validity co- 
efficients are offered. 

This paper summarizes several reliability 
studies and one attempt to evaluate the test’s 
validity. 


Reliability 


Baxter (1) calculated both split-half and 
Kuder-Richardson reliabilities for a sample 
of 100 applicants for hourly positions at the 
Owens-Corning Fiberglas Corporation. Chew 
(2) calculated split-half coefficients for a 
sample of 249 male applicants of a steel mill. 
On a sample of steel mill applicants (Inland 
Steel Company) who were hired, test-retest 
coefficients were computed. For this sample, 
the retest interval was three months. 

The results of these studies are summarized 
in Table 1. In general these results are 
fairly consistent, and suggest that the tests 
are not reliable enough for individual predic- 
tion. No reliability study is reported for the 
Manual Coordination test. 


Validity 

The battery, including the Manual Coordi- 
nation Test, was given to a sample of 88 em- 
ployed plant protection officers for whom su- 
pervisory ratings were collected. For the rat- 
ings, the plant protection officers were divided 
into six groups: the top best sixth of the 
force, the second sixth, and so on down to 
the bottom (worst) sixth. 

The score for test X, Manual Coordination, 
is originally a time score, ranging from about 
two to six minutes. Since time scores are not 
readily amenable to treatment as moment 


Table 1 


Coefficients of Reliability for Kopas Subtests 


Inland 
Steel 
Test 
Retest 


Baxter Baxter Chew 

Split-Half Kuder Split-Half 
Test (Corrected) Richardson (Corrected ) 
A 59 69 66 1) 
B 77 ] SX 76 
( 87 ' 93 531 
ID 59 5X 04 65 
LD 58 72 70 
I 


RX $2 9] &7 
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Table 2 
Means, Standard Deviations, and Intercorrelations of the Subtests of the 
Kopas Battery and Supervisory Ratings 
(Sample: 88 Plant Protection Officers 


Test 
Mechanical -— 
Math and Science 00 
Ability to Get Along with People 24 
. Preference for Non-Routine Work — 07 
Emotional Stability — .05 
Ambition 5 
Manual Coordination 23 
Supervisory Ratings 20 


Mean 15.1 

S.D. 5.0 
statistics, these 
speed scores: 


scores were converted 


1,000 


Speed Score = = , 
time score 


The intercorrelation matrix is reported in 
Table 2. Considered as a battery, the test 
intercorrelations are satisfactorily low. With 
the exception of the correlation of .6 between 
tests A (Mechanical) and B (Math and Sci- 
ence), these intercorrelations are generally 
not significantly greater than zero. 

However, the criterion correlations are all 
equally low. A multiple correlation was cal- 
culated using the Wherry-Doolittle test selec- 
tion method (5). The maximum shrunken 
multiple correlation coefficient obtainable, 
with tests B, C, and D, was 348. This is 
far too low for effective use as a selection de- 
vice. The conclusion is warranted that the 
battery does not successfully predict perform- 
ance rating of plant protection officers. While 
there is no basis for inference concerning the 
battery’s validity for other occupations, these 
findings underscore the need for specific va- 
lidity determinations. The unusual character 
of some of the tests (e.g., measuring ‘“Am- 
bition” as a function of the diversification of 
leisure-time interests) makes this even more 
necessary. The uncertain reliability of the 
tests in the battery, however, suggests that 
they hold little promise as effective personnel 
tools, 


Tests 


D 


Summary 


1. In two independent samples, the internal 
consistency reliabilities of the six nonper- 
formance tests in the Kopas Battery ranged 
from .58 to .93. In another sample, test- 


retest reliabilities ranged from .40 to .87. 
2. For a sample of 88 plant protection offi- 
cers, the seven tests in the battery were un- 


correlated with one another, and correlations 
with Supervisory Ratings were negligible. 
The multiple correlation was .348. 

3. These results indicate the need for care- 
ful validation, if the tests are to be used. 
They suggest, in view of the limitations on 
test reliability, that the battery may not 
prove to be of significant value in employee 
selection. 
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Response Reliability of the Activity Vector Analysis 


James N. Mosel 


The George Washington University 


The Activity Vector Analysis (AVA) ' is 
a new personality appraisal instrument which 
appears to be gaining wide popularity in in- 
dustrial personnel circles. Although the test 
has received notice in business and industrial 
publications, there have been no reports in 
the professional psychological literature on 
its reliability or validity. The only study of 
which the writer is aware is an abstract by 
Dorcus and Jones (1, p. 402) of unpublished 
data on machinists received from the test’s 
originator, W. V. Clarke. These data showed 
some validity against the criterion of super- 
visors’ ratings. 

The test consists of 81 descriptive adjec- 
tives, such as “easy-going,” “high-spirited,” 
“impulsive.” The testee is requested to check 
all items which have ever been used by any- 
one in describing him, and then from the same 
list all items which he believes are truly de- 
scriptive of himself. He draws a line through 
any item which he does not understand. (For 
convenience of reference, the two sets of 
checked items will be referred to as the 
“other” and “self” choices. ) 

The scoring and interpretation of the test 
can be learned only by undergoing training 
administered by the test’s originator. The re- 
sults are presented in the form of a summary 
profile of six scores (an activity score and five 
vector scores), accompanied by an evaluative 
report covering the following topics: sum- 
mary of major characteristics, environmental 
conditions, work requirements, social con- 
tacts, supervision required, accident poten- 
tial, and a final over-all comment. This in- 
formation is then related to the needs of a 
previously analyzed job. 

In a test of this type there are two prob- 
lems of reliability: the test-retest consistency 
of the testee (“response reliability”) and the 
reliability of the interpreter’s judgments. It 
is evident that without adequate response re- 


' Copyright, 1950, by Walter V. Clarke 


liability there can be no interpreter reliability, 
nor can the interpretations have validity. 

The present paper reports some preliminary 
evidence on the question of response reli- 
ability. Since the method of scoring the 
AVA is not available in open source, the re- 
liability of the responses rather than the 
scores was the object of study. 


Procedure 


Fifty-two employed adults in evening classes 
of a university were administered the AVA 
on two occasions, separated by an interval of 
about two weeks. Instructions at the first 
administration were designed to conceal the 
fact that the test would be given a second 
time. 

As a measure of retest consistency, the 
common elements or overlap coefficient of 
correlation (2, pp. 120-123) was computed 
for each individual.* This coefficient is a 
measure of the extent to which an individual 
selects the same items on both trials; it is 
essentially the proportion of overlap between 
his first and second sets of choices. A simi- 
lar technique has been used by Zubin (3) in 
measuring the similarity between two indi- 
viduals on a check list. A value of 1 means 
that the same items were selected on both 
trials; zero means that there were no items 
common to both trials. 


Results 


Table 1 shows the distribution of overlap 
coefficients for the “other” and “self”? choices. 
There are large individual differences in con- 
sistency, the exact ranges being from .28 to 
.98 for the “other” choices and from .35 to 
.94 for the “self.” 

As an approximation of a reliability coeffi- 
cient for the entire group, the mean overlap 

Tr . Mia . 

? The formula is: ry» = , where m, is the num 

: : Vn; Ne 
ber of items chosen the first trial, m, the number chosen 


the second trial, and my, the number common to both 
trials. 
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coefficient was computed. The resulting values 
for the “other” and “self’’ choices were almost 
identical, .74 and .73. To the extent that 
these values can be considered as conven- 
tional reliability coefficients, they border on 
respectability; but the effect is reduced by 
the fact that for both sets of choices the ma- 
jority of all overlap coefficients were less than 
80. 

It was found that there is only a moder- 
ate relationship between consistency on the 
“other” and “self” choices, the Pearson cor- 
relation between the two sets of overlap meas- 
ures being .57. 

Table 2 shows the means and variabilities 
of the number of items chosen on the first 
and second trials. It will be noticed that 
the mean number of items chosen increases 
slightly for both sets of choices on the second 
trial. Also, fewer items are chosen for the 
“self” choice than for the “other” choice; the 
variabilities are correspondingly smaller. 


Summary 


The AVA, an industrial personality test, 
requires the testee to select from 81 descrip- 
tive adjectives all those which anyone has 
ever used to describe him (“other” choices) 
and all those which he believes are truly de- 
scriptive of himself (‘“self” choices). The 
test-retest reliability of these choices was de- 
termined by means of the common elements 


Table 1 


Distribution of Overlap Correlation Coefficients 
between First and Second Trials 


Frequency 


Intervals of riz “Other” “Self” 


.90-.99 7 5 
80-89 15 18 
.70-.79 10 14 
00-09 10 7 
50-59 5 3 
40-.49 2 3 
30-.39 2 2 

1 


.20-.29 0 


Total 52 52 


James N. Mosel 


Table 2 


Means and Standard Deviations of the Number of AVA 
Items Chosen on the First and Second Trials 





First Trial 


Self” 
29.8 
13.3 


“Other” 


Mean 34.6 
SD 17.0 


Second Trial 
“Self” 


“31.0 — 
11.6 


“Other” 


Mean 37.8 
SD 17.8 


correlation coefficient (proportion of overlap) 
for each of 52 individuals. 

There was considerable individual varia- 
bility in consistency. While the mean coeffi- 
cients of overlap for the two sets of choices 
were .74 and .73, in both sets of choices over 
half of the coefficients were less than .80. 
Consistency on one set of choices, as indi- 
cated by the correlation between the overlap 
coefficients of the two sets of choices, was 
not closely associated with consistency on the 
other (r = :57). 

It would seem from these results that in- 
terpretations of the AVA might be disturbed 
by instability of response in an appreciable 
number of cases, the amount of disturbance 
depending on how much is staked on a given 
item in interpretation. There is the possi- 
bility, of course, that retest consistency itself 
might prove a useful indicator of personality, 
but this is not taken advantage of in the 
present method of utilization. 
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The Specialization Level Scale for the Strong Vocational Interest 
Blank ' 


Milton G. Holmen 
AFF Human Research Unit No. 2, Fort Ord, California ? 


A report describing the development of the 
specialization level scales for the Strong Vo- 
cational Interest Blank and the Medical Spe- 
cialists Preference Blank for use in planning 
a medical career was made by this writer in 
the recent monograph by Strong and Tucker 
(2, pp. 45-47). The purpose of the present 
paper is to describe an investigation into the 
possibility of using the specialization level 
scale for the Strong blank in the counseling 
of men who are not planning a medical career. 

It was originally conceived that a scale 
might be developed for use in prediction of 
job satisfaction in the medical specialties. 
Such a scale was developed by assigning 
weights to those items on the Strong blank 
which showed differences between the re- 
sponses of medical specialists as a group and 
the responses of physicians-in-general. Since 
this scale “subtracted” the interests of one 
group of medical men from those of another 
group of medical men, it was expected that 
medical interest would be subtracted out, 
leaving a scale which might measure interest 
in specialization in non-medical as well as in 
medical areas. 

It might be of interest at this point to con- 
sider what kinds of items are assigned differ- 
ent weights in the Specialization Level scale. 
On quite a few items, such as the occupations 
of astronomer and author of a technical book, 
a plus-weight is assigned to liking the item 
and a minus-weight assigned to being indif- 

1 This research was conducted at Stanford Univer- 
sity as a part of the Medical Specialists Research 
Project, under Contract No. W-49-007-MD-483 with 
the Surgeon General, U. S. Army. The opinions ex- 
pressed in this paper are those of the writer and do 
not necessarily reflect those of the Department of 
the Army. 

This study was a part of a doctoral dissertation 
(1). The writer wishes to express appreciation to 
Prof. Donald W. Taylor and Col. Anthony C. Tucker 
for guidance in conduct of the study, and to Dr 
Edward K. Strong, Jr. for suggestions on the re- 
search and use of data from his files. 


2 This is a division of the Human Resources Re- 
search Office, The George Washington University. 


ferent or disliking. On others, such as the 
occupations of bookkeeper, auto salesman, 
and bank teller, disliking the item is as- 
signed a plus-weight, with minus-weights as- 
signed to indifference or liking. Items in 
which the indifferent response is not weighted 
are quite common, such as the occupation of 
certified public accountant and the feeling to- 
ward pet canaries for which plus-weights are 
assigned to disliking and minus-weights to 
liking. Liking of social problem movies is 
assigned a plus-weight; disliking them is as- 
signed a minus-weight. There are a few 
items, such as “chopping wood” and “pet 
monkeys,” for which only the indifferent re- 
sponse is weighted. Poetry, smokers, and the 
study of agriculture are assigned minus- 
weights for liking, plus-weights for indiffer- 
ence, and no weight for disliking. On several 
items, such as the occupations of music 
teacher and YMCA worker and the activity 
of solving mechanical puzzles, plus-weights 
are assigned to the response of indifferent and 
minus-weights to disliking. 


Results 


Relationship between Specialization Level 


Scores and Educational Level. Are scores on 
the specialization level scale related to amount 
of education? Such a relationship might be 
expected since high scores on the scale are 
obtained more often by persons engaged in 
occupations for which a considerable amount 
of specialized training is required. Mean 
scores on the scale were obtained for mem- 
bers of fourteen occupational groups. These 
means were obtained by use of the method 
recently reported by Strong (5) which pro- 
vides mean scores on any scale for any group 
from the summary data on the responses of 
that group to each item on the Strong blank. 

These fourteen occupational groups made 
up four subject-matter clusters. It was pre- 
dicted that within each cluster the specializa- 
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Table 1 


Relationship between Mean Specialization Level 
Scores and Mean Educational Levels 


Mean 
Educa- 
tional 
Level 
(years) 


Mean 
Speciali- 

zation 

Level 

Score 

Medical Group 

Medical specialist 50.0 20* 
39.5 19* 
34.9 17.0 
32.8 14.9 


Physician 
Osteopath 
Dentist 
Social Science Group 
Psychologist 54.8 
41.8 


19.0 
Social science teacher 16.4 
\ccounting Group 
C. P. A. 43.9 14.3 
Accountant 39.6 12.3 
Office worker 35.9 11.5 
Physical Science Group 
Mathematician 48.8 18.8 
48.8 18.5 
45.5 16.8 
Math-science teacher 42.1 16.4 


41.8 15.4 


Physicist 
Chemist 


Engineer 


* Educational level of group estimated. 


tion level mean scores would correlate posi- 


tively with the mean educational levels. The 
correctness of this prediction is indicated by 
the fact that, within each of the subject- 
matter clusters, the mean specialization scores 
were arrayed in the same order as the mean 
educational levels. The educational levels 
and specialization level scores of these groups 
are presented in Table 1. 

The groups used for the comparisons pre- 
sented in Table 1 were those used for the con- 
struction of occupational scales on the Strong 
blank (4, pp. 694-717). Higher education- 
level means would undoubtedly be obtained 
from present members of these occupational 
groups, but the general trend would probably 
vary little from that indicated in Table 1. 

The relatively low scores of dentists and 
engineers may be taken to indicate that the 
scales measure a kind of specialization em- 
phasizing theoretical rather than technical 
considerations. Study in the occupations in 
which the highest scores were recorded ordi- 
narily involves more theoretical work than 


those in which lower scores were obtained. 
Though the evidence on this point is far from 
conclusive, the data suggest an hypothesis for 
further investigation. 

The mean score of psychologists is of par- 
ticular interest, since it was the highest mean 
score obtained, even higher than that of the 
medical specialists. An objection may be 
made to considering psychologists and social 
science teachers as working in the same sub- 
ject-matter area, but the social science teacher 
group was the only one at all appropriate on 
which data were available for making a com- 
parison. 

Specialization Level Scores and Success in 
Graduate School. The data above indicate 
that a positive relationship exists between 
specialization level scores and amount of edu- 
cation, but do not provide as precise an idea 
of what the scale measures as would be de- 
sirable. The specialization level scores of 
subgroups within two other occupations were 
therefore obtained to provide a more precise 
estimate of what the scale measures. 

Strong blanks (1927 edition) were ob- 
tained from two groups of former students 
of the Stanford Graduate School of Business. 
These blanks had been administered to the 
students during their first year in the School. 
Seventy-five of the men who filled out these 
blanks were later awarded the degree of 
Master of Business Administration (M.B.A.). 
The other 75 had dropped out or failed be- 
fore getting the M.B.A. The groups were 
matched by year to equalize any differences 
that may have existed from year to year in 
the admission policy of the School. Blanks 
from the classes of 1929 through 1941 were 
used. The mean standard scores of men not 
receiving the M.B.A. degree was 45.5; that 
of men getting the degree was 45.5. 

The Strong blanks (1927 edition) of 150 
chemists were also scored on this scale. The 
blanks used were a part of those obtained in 
development of the chemist scale on the 
Strong blank. Fifty of the blanks were from 
men who had Ph.D. degrees or had com- 
pleted at least seven years of college training. 
Fifty were from chemists with Master of Sci- 
ence degrees, or with five or six years of col- 
lege training. Finally, 50 were from chemists 
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with three or four years of college training. 
Most of this latter group held the degree of 
Bachelor of Science. All 150 were members 
of the American Chemical Society at the time 
of testing. None was a teacher or professor 
of chemistry. 

The mean standard scores for the three 
groups of chemists on the specialization level 
scale were as follows: Ph.D. group, 52.2; 
M.S. group, 47.8; and B.S. group, 46.5. The 
standard deviations for the three distributions 
were 8.1, 8.3, and 8.1, respectively. The 
critical ratio of the difference between the 
Ph.D. group and the M.S. group was 2.65, a 
difference which would occur by chance less 
than one time in a hundred. The critical 
ratio of the difference between the Ph.D. 
group and the B.S. group was 3.20, which 
would occur by chance less than one time in 
a thousand. The difference between the M.S. 


group and the B.S. group was not significant 
(C. R. of .81), but was in the expected di- 
rection. 

An investigation was made to determine 
whether or not the differences between groups 
of chemists with respect to specialization level 


scores might be due to differences in age be- 
tween members of the groups. The mean age 
of the Ph.D. group was found to be 36.3 
years at the time of testing; the M.S. group 
averaged 34.0 years; and the B.S. group av- 
eraged 35.7 years. None of the differences 
between pairs of these means was found to 
be significant. 

Why should scores on the specialization 
level scale be related to amount of formal 
education for chemists but not for students 
of business administration? One reason for 
this difference may be that the two courses 
differ almost as much in purpose as in sub- 
ject matter. Generally speaking, the more 
formal training a chemist receives, the more 
specialized that training becomes. Research 
for the master’s thesis usually involves a 
minimum of six months of work on a nar- 
row, specific phase of chemistry. At least a 
year is spent on a single problem in prepara- 
tion of the doctoral dissertation. The pur- 
pose of the training for the M.B.A., on the 
other hand, is to provide a broad education 
for business executives, not to provide train- 


ing for specialists in any one phase of busi- 
ness. 

Research on these two groups suggests that 
the scale does not measure mere liking or 
tolerance for education, but willingness to re- 
strict one’s activities to a very narrow field. 
This, of course, is the very essence of spe- 
cialization. One might object that train- 
ing toward the doctoral degree in chemistry 
should not be compared with training for the 
master’s degree in business administration. 
However, the M.B.A. is the highest degree 
ordinarily granted in the field of business ad- 
ministration, except to persons who plan to 
teach business subjects in colleges and uni- 
versities. Furthermore, the difference found 
between Ph.D.’s and B.S.’s in chemistry was 
also found between M.S.’s and B.S.’s in that 
field, although the latter difference was less 
significant. Although this aspect of the re- 
search on the scale may not be considered 
conclusive, it appears on the basis of infor- 
mation available that the scale has been ap- 
propriately named. 

The question naturally arises, in connec- 
tion with such groups as chemists, whether 
the scale has any practical value. The data 
presented above indicate that the specializa- 
tion level may be of value in predicting 
whether a given student who plans to enter 
chemistry will enjoy the narrowing and 
“heightening” of work required of the Ph.D. 
However, if this scale is to be used as a basis 
for the counseling of a person planning to 
undertake a program of graduate study in 
chemistry, it must provide more information 
on this subject than does the chemistry scale. 
The correlation between the chemist and spe- 
cialization level scales (using the blanks of 
the 150 chemists discussed above) was found 
to be only .06, so the two scales certainly do 
not measure the same thing. 

To test whether the specialization level 
scale would provide more information than 
would the chemist scale with respect to 
amount of graduate study to plan for, two 
comparisons of the efficiency of these two 
scales were made. Both comparisons involved 
finding the significance of the difference be- 
tween proportions of overlap for the two 


scales (3, pp. 75-76). The proportion of 
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overlap indicates the proportion of persons 
in one group who would be classified as mem- 
bers of another group on the basis of scores 
on the scale in question. The first compari- 
son made used the blanks of the 50 chemists 
described above who had Ph.D. degrees or 
had completed at least several years of col- 
lege training. The second involved the 50 
chemists who had a B.S. degree, or had com- 
pleted only three or four years of college 
training. Both groups were used in deter- 
mining the cutting points on which the pro- 
portions of overlap between them were based. 
For the Ph.D. group, the proportion of over- 
lap on the chemist scale was .42 and on the 
specialization level scale was .34. For the 
B.S. group, the proportion of overlap for the 
chemist scale was .46 and for the specializa- 
tion level scale was .36. For both of these 
groups, the differences between the two pro- 
portions of overlap were significant at the 
.O1 level. 

The data obtained from the blanks of chem- 
ists indicate that the specialization level scale 
can be used in at least one area outside the 
field of medicine. The data obtained from 
blanks of students of business administration 
point out the limitations in the use of the 
scale. It cannot be used to predict success 
in graduate school without consideration of 
what field the counselee plans to enter. 


Theoretical Implications 


The material presented suggests that a basic 
dimension of interests has been identified, 
however crudely. Within the subject-matter 
areas tested, the specialization level scale ap- 
pears to separate those doing highly special- 
ized work requiring long training from those 
doing other kinds of work. It may be in- 
teresting to compare the specialization level 
scale with the occupational level scale. The 
occupational level scale separates business 
and professional men from those doing other 
work. The specialization level scale makes a 
similar sort of separation within some of the 
business and professional groups. That it 
measures something different from what is 
measured by the occupational level scale is 
indicated by the fact that the correlation be- 
tween the two scales, based on the blanks of 
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400 medical specialists, is only .07. The cor- 
relation would undoubtedly be higher if it 
were based on blanks of groups with a greater 
range of scores on these two scales, but the 
specialization level scale is of primary interest 
in groups for which the range of occupational 
level scores is restricted. 

The material also suggests an extension of 
the concept of point of reference as used in 
the development and interpretation of voca- 
tional interest scales (4, pp. 553-576). The 
essential aspect of this concept as developed 
by Strong is that the best reference group 
from which to construct the scales used for 
scoring a given individual’s blank would be 
one which included all, and only, the occupa- 
tions the individual might consider entering. 
This is a somewhat idealistic definition, and 
practical considerations prevent use of a dif- 
ferent set of scales for every person who 
takes the test. However, many of the per- 
sons taking the test are nearly enough alike 
to consider entering the broad group of oc- 
cupations and professions represented in the 
P, reference group. This reference group 
consists of the men engaged in the occupa- 
tions college men ordinarily enter (4, pp. 
712-713). The scales based on this group 
can be used only with respect to blanks of 
persons who do belong, or may be expected 
to belong, to the reference group on which 
these scales were based. 

The suggestion made here is that scales can 
be developed which are based on differences 
between two levels of one group, and that 
scores of these scales can provide valuable in- 
formation to persons not members of the 
groups on which the scales were constructed. 
Scales constructed to measure the differences 
between two subgroups within a single oc- 
cupational or professional group may provide 
measures that are relatively independent of 
the occupations on which they were con- 
structed. 


Summary 


The specialization level scale was developed 
for the Strong blank to separate medical spe- 
cialists from physicians-in-general.' Research 
reported here was undertaken to determine 
whether or not this scale might provide use- 
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ful information about other occupational 
groups, and thus identify specialization level 
as a dimension of interests comparable to oc- 
cupational level. 

Mean scores were obtained on the scale for 
ten occupational groups in three non-medical 
subject-matter areas and four groups within 
the field of medicine. Within each of these 
areas, the occupational groups were ranked 
in the same order by specialization level 
mean scores as by the mean educational level 
of their members. Further research indicated 
that chemists with Ph.D. degrees could be 
separated by this scale from those with less 
specialized training, but that the scale did not 
differentiate students who had qualified for 
the Master of Business Administration degree 
from those who had entered training for this 
degree but had failed or dropped out of school 
before receiving it. 

While the evidence presented is not con- 
clusive, it does indicate that a dimension of 
interests has been identified and that further 
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research on the nature of this dimension is 


merited. It indicates further that scales 


measuring intra-group differences may be of 
value for predicting with respect to occupa- 
tional groups not used in the construction of 
the scales, provided norms are available for 
these other occupational groups. 


Received June 30, 1953 
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Interest Patterns for Certain Degree Groups on the Lee-Thorpe 
Occupational Interest Inventory ' 


Andrew H. MacPhail 


Department of Education, Brown University 


For several years men students entering 
Brown University have been asked to fill out 
the Occupational Interest Inventory (Ad- 
vanced Series, Form A; Lee-Thorpe). The 
degree group patterns discussed here are 
based on scores made by 2,380 candidates for 
the A.B. degree, 170 for the Sc.B. in Chem- 
istry, and 578 for the Sc.B. in Engineering. 
For the purposes of this study it seems rea- 
sonable to consider these three degree candi- 
dacy groups as being validation groups. Cer- 
tainly this is true in the sense that students 
must meet specific requirements in order to 
be admitted to a particular degree candidacy, 
plus the effect of self-selection as manifested 
by the interest in seeking one degree rather 
than another. 

Means and standard deviations were com- 


1 Published by the California Test Bureau, Los 
Angeles 28, California. 


puted for each degree group on each part of 
the Inventory and the significance of the dif- 
ferences of mean scores made by the several 
degree groups on each part of the Inventory 
was then determined. Table 1 shows the 
pattern of mean scores with percentile equiva- 
lents for each of the degree groups. Some 
idea of the degree of overlap of groups may 
be inferred from the data in this table. How- 
ever, of the 30 critical ratios computed 22 
were found to be significant at the one per 
cent level. ; 

Table 2 shows that the mean scores made 
by the Arts group and Engineer group differ 
by an amount significant at the one per cent 
level on every part of the Inventory. The 
mean scores made by the Arts group differ 
significantly at the one per cent level from 
those made by the Chemist grovp on seven 
of the ten parts of the Inventory, and on 


Table 1 


Degree Group Patterns on the California Occupational Interest Inventory 
Mean Scores * 


Arts 
(N = 2380) 


Raw Scores 


Mean S.D. vile 
Fields: 
Personal-Social 
Nature 
Mechanical 
Business 
Arts 
Sciences 


Nm 


76.0 
37.0 
20.0 
72.5 
67.0 
26.5 


on ah 
mo ¢ 


—— GI Om 
a 


NNN — — Nb 


w 


Types: 
Verbal 3.3 44 
Manipulative 12.7 2.4 
Computational 10.3 4.2 
73.4 8.8 


79.3 
42.0 
58.0 
Level 73.0 


* Percentile equivalents are the publisher’s. 


Chemists Engineers 
(N = 170) (N = 578) 


Raw Scores Raw Scores 


Mean S.D. “rile Mean S.D 


17.4 50 
19.3 7.0 
20.5 
17.6 7.0 
16.1 5.9 
32.1 5.4 


57.0 15.5 
41.0 19.4 

5.0 37.5 24.6 

43.0 19.0 

40.5 16.7 

80.5 


8.5 3.5 
13.7 
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Table 2 


Critical Ratios (diff./PE diff.) for Arts-Chemists; Arts 
Engineers; Chemists-Engineers on the California 
Occupational Interest Inventory 

Note: All ratios are significant at the 1 per cent level, 
or better, except as indicated. 


Arts 
Chemists 


Chemists 
Engineers 


Arts- 
Engineers 
Fields: 

Personal-Social 

Nature 

Mechanical 

Business 

Arts 

Sciences 


+13.8 +35.0 + 6.5 
1.9 3.8 .26t 

13.0 47.0 13.8 

415.4 +20.0 3.4t 

+17.2 +25.0 1.8t 
35.4 37.5 

Types: 
Verbal +-25.0 +48.0 
Manipulative 6.6 4.5 
Computational + 1.9f + 5.1 

3.5t 


Level 
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Level the difference is significant at the two 
per cent level. On Nature and Computa- 
tional the differences would not be considered 
significantly great, according to current con- 
vention, since the level of confidence does not 
even reach five per cent. In terms of mean 
scores the Chemists are not as clearly differ- 
entiated from the Engineers as either of these 
groups is from the Arts group. However, one 
per cent confidence levels are reached on the 
Personal-Social, Mechanical, Sciences, Verbal, 
and Manipulation. On Business the three 


per cent level is reached but the differences 


t 3.4 is significant at the 3 per cent level, and 3.5 at 
the 2 per cent. 

t Not significant at the 5 per cent level. 

(Note: Differences in favor of the first member of the 
group, such as Arts over Chemists, have a + sign in 
front of the critical ratio. Differences in favor of the 
second member of the group, such as Chemists over 
Arts, have no sign in front of the critical ratio.) 


Table 3 


Critical Ratios (diff 


PE diff.) for Arts-Chemists; Arts-Engineers; Chemists-Engineers on the 


California Occupational Interest Inventory 


Arts 
over 
Chemists 


Arts 
over 
Engineers 


verbal 


SC it nce 
per SOL, 
arts 
business 


verbal 


arts 
business 
per.-soc. 


Chemists 
over 
Arts 


Chemists 
over 
Engineers 


Engineers 
over 
Arts 


Engineers 
over 
Chemists 


mechanica! 


sc1ence 


mechanical 


mechanical 


science 


manipulative 


computational 


lei el 


computational nature 


mw we Ww oe he he 


96 
6 


All critical ratios over 3.5 in this table are 
the 2 per cent level, and 3.4 at the 3 per cent 


significant at the 1 per cent level, or better; 


per.-soc, 


manipulative 
verbal 

level 
manipulative 

nature 


DUSINESs 


arts 
computational 
le el 

nature 


3.5 is significant at 
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on the other four parts would not commonly 
be called significant. 

In Table 3 the 30 critical ratios, 22 of them 
significant at the one per cent level, are ar- 
ranged in descending order of magnitude. 
The specific purpose of this table is to give 
emphasis to the relative differential values of 
the ten parts of the Inventory with respect 
to the three degree groups, and it is a very 
simple matter to discover which part or parts 
of the Inventory have the greatest differential 
value and between which degree groups. 
Thus, for example, the table shows clearly 
that the three parts having the highest dif- 
ferential value are Verbal, Mechanical, and 
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Science and are effective in distinguishing the 
Arts and Engineer groups. 

Needless to say, in practical use the mean 
scores for the ten parts of the Inventory 
(Table 1) would be rounded off to the near- 
est whole unit. The writer has made con- 
siderable use of this Inventory in student con- 
sultations and feels confident that the data 
presented here will enhance its value for such 
use.” 

Received December 16, 1953. 
Early publication. 
2 E. M. Hess and E. C. Allison gave valued help in 


the computational work involved in the conduct ot 
this study. 
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Rated Stimuli 


A. W. Bendig 
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Two previous articles (1, 2) have reported 
on the relationship between the reliability of 
self-rating scales and the number of cate- 
gories on the scale. Two types of internal 
consistency measures of reliability were in- 
vestigated: individual rater reliability, a meas- 
ure of the ability of single raters to discrimi- 
nate differences between the rated stimuli, 
and test reliability, a measure of the indi- 
vidual differences between raters in consist- 
ently assigning high or low ratings to the 
stimuli. Since this second type of reliability 
is a measure of what Guilford calls the “sys- 
tematic error” of raters (4, p. 273), in a rat- 
ing situation test reliability becomes a meas- 
ure of “rater bias,” i.e., the extent of the 
tendency for single raters to consistently 
over-rate or under-rate the particular stimuli 
presented to them. The results of the first 
two studies indicated that individual rater 
reliability is constant for self-rating scales 
with 5, 7, or 9 categories, drops slightly for 
11-category scales, and appeared to fluctuate 
for 2- and 3-category scales. Because of the 
inconsistent results with shorter scales, fur- 
ther investigation appears necessary. 

In the second paper (2) one of the hy- 
potheses suggested was that the reliability of 
ratings is a function of the heterogeneity of 
the rated stimuli. Stimuli that are distinctly 
different in the perceptual field of the rater 
should enable the rater to make simple and 
consistent judgments of difference between 
the stimuli, while stimuli that are quite simi- 
lar on the rated dimension would overlap con- 
siderably and lead to disagreements between 
different raters as to the relative order of the 
stimuli on this dimension. Volkmann (7) 
has summarized the evidence indicating that 
the width of a set of rating scale categories 
is partially dependent on the range of the 
stimuli presented to the rater to be judged. 
The rater tends to adjust the psychological 
length of the scale to fit the range of the 


stimuli. However, Volkmann points out (7, 
pp. 280-281) that this adjustment process is 
not completely flexible: the categories cannot 
be indefinitely compressed without a loss of 
the rater’s ability to scale stimuli. This sug- 
gests that rater reliability will decrease as the 
homogeneity of the stimuli increases and pro- 
vides the experimental hypothesis for this 
study. 


Procedure 


Stimuli. In the previous study (2) 236 Ss 
had rated the list of 20 foods used by Wallen 
(8) as to preference value. From their mean 
ratings these foods were ranked in order. from 
the most' liked to the least liked food. Three 
sublists each containing 10 foods were then se- 
lected for-the present study. List 1, the list 
containing the most heterogeneous food stimuli, 
was composed of the top five and bottom five 
foods from this ranking. List 2, of intermediate 
heterogeneity, was composed by selecting in a 
double alternation pattern 10 foods from this 
ranking. Foods ranked 1, 4, 5, 8, 9, 12, ete. 
were used for List 2. List 3, with the most 
homogenecus stimuli, contained the middle 10 
foods: those ranked from 6 to 15. All three 
lists had a mean rank of 10.5 in the original 
ranking, but rank variances of 58.25, 34.25, and 
8.25. The original list of 20 foods had, of 
course, a mean rank of 10.5 and a rank variance 
of 33.25. 

Scales. Four lengths of scale were used: con- 
taining 2, 3, 4, or 5 categories. The three de- 
scriptive statements used in the previous study 
(2) were used to verbally anchor these scales. 
Scales with 3 or 5 categories had an anchor under 
each of the end categories and also under the 
center category. The 4-category scale also had 
an anchor under each end category, but the cen- 
ter statement was located mid-way between the 
two center categories. For the 2-category scale 
the center anchor was omitted and the two end 
anchoring statements used with the other three 
scales were placed under the two categories. The 
lowest category on each scale was given a nu- 
merical weight of 1, the highest category num- 
bered 2, 3, 4, or 5, with intermediate categories 
numbered accordingly. 

Subjects. The twelve combinations of three 
stimuli lists and four lengths of scales were 
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mimeographed with instructions on single sheets 
and randomly distributed to 278 Ss. The Ss 
were students enrolled in daytime sections of in- 
troductory, social, applied, and educational psy- 
chology classes at this university during the 
spring, 1952~53, semester. These Ss recorded 
their ratings on standard five-choice IBM answer 
sheets for convenience in the later statistical 
analysis. The raters were told that the re- 
searcher was investigating the adequacy of dif- 
ferent rating scales in assessing the food prefer- 
ences of college students and were requested to 
sign their names to the ratings. 

Analysis. The Ss by stimuli matrix of ratings 
for each of the twelve sub-groups of raters was 
analyzed by analysis of variance procedures. 
From these analyses intraclass estimates of the 
average reliability of individual raters were ob- 
tained (3), along with the reliability with which 
each stimuli-scale combination (test) encouraged 
rater bias among the raters in each subgroup (6, 
pp. 93-95). Judgment of the significance of each 
reliability coefficient was based upon the magni- 
tude of the F ratio associated with this coeffi- 
cient. 


Results 


The results of the twelve analyses of vari- 
ance are given in Table 1 and the obtained 
reliability estimates are summarized in Table 
2. Individual rater reliability increased ap- 
proximately linearly as a function of the 
heterogeneity of the stimuli with the average 
reliabilities of Lists 1, 2, and 3 being .22, .15, 
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and .06. Rater reliability rose as the num- 
ber of scale categories was increased from 2, 
through 3, to 4, and dropped slightly, but 
consistently, at 5 categories. To assess the 
significance of these findings, Kendall’s non- 
parametric rank coefficient W (5) was used. 
Ranking the rater reliabilities across the rows 
in Table 2 gave a W of .558 which has an 
approximate probability of .07 of occurring 
by chance (5, pp. 146-147). Ranking the 
same reliabilities down the columns resulted 
in a W of .333 which is significant at the .01 
level. Inspection of the rater reliabilities in 
Table 2 suggests little interaction between 
heterogeneity of the stimuli and length of 
scale, although no statistical test of such an 
interaction was possible. The measures of 
rater bias in Table 2 present a somewhat dif- 
ferent picture. Rater bias also increased 
from 2 to 4 categories and slightly declined 
with the 5-category scale, but the results are 
somewhat less consistent than with rater re- 
liability. Also, List 2 was generally the best 
set of stimuli in encouraging systematic rater 
error and List 1 the least subject to bias, but 
there results are manifestly a function of the 
length of the scale. For example, when the 
4-category scale was used, List 1 was found 
to be most biased and List 2 least biased. 


Table 1 


Reliability Coefficients and Significance Tests for Each Rating Group 


Number of 
Raters 


Number of 
Categories 
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Table 2 


Summary of Reliability Coefficients as Functions of 
Stimuli Heterogeneity and of the Number 
of Rating Scale Categories 


Number of Scale Categories 


List 2 3 4 5 
Pie .26** 2 
o77* 16° .15** 1 


04* O01 .0o** 06 


Mean 
Individual 1 BP nas 
Rater 2 
Reliability 3 
Mean .09 Al 17 14 


Individual 
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Rater 2 
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30 : 33 


Mean .34 49 5: 44 


* Group results significant at the .05 point 
** Significant at the .01 point 


Applying the same W method to the bias 
measures gave values of .083 (ranking across 
rows) and .021 (ranking down columns), 
neither of which is significant at the .90 
level. For rater bias the interaction of 
stimuli heterogeneity and length of scale ap- 
pears to be the most important source of 
variation among the reliability coefficients. 


Discussion 


The results of this study have partially clari- 
fied the relation between rater reliability and 
scale length for short rating scales used for self- 
rating. Scales with 2 categories are less reliable 
than those with 3 categories, which confirms the 
results in a previous study (2). A 4-category 
scale yields somewhat more reliable stimuli rat- 
ings than either a 3-category or a 5-category 
scale, with a 5-category scale being slightly more 
reliable than a 3-category scale. This last state- 
ment (3 vs. 5 categories) contradicts the previous 
study (2), and probably only an appeal te omni- 
present sampling fluctuations can reconcile this 
discrepancy in the results of the two studies, 
especially when we note that our first study (1) 
found no difference in rater reliability between 
3- and 5-category scales. The general conclusion 
of no difference in rater reliability with 3- and 
5-category scales appears warranted when all 
three studies are considered. The small, but 
significantly consistent superiority of 4-category 
scales is interesting in light of the hypothesis 
suggested by Jones’ that rating scales with an 
even number of categories may yield stimulus 
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ratings of higher reliability. The inclusion of a 
center category in a scale with an odd number of 
possible responses may encourage the rater 
“error of central tendency” (4, p. 272) and re- 
duce rater reliability. This hypothesis needs 
further investigation. 

The hypothesis derived from Volkmann (7) 
that raters cannot compress their psychological 
reference scale to give reliable ratings of homo- 
geneous stimuli was confirmed. The suggestion 
that to achieve reliable ratings the rated stimuli 
should cover a wide range of the rating con- 
tinuum appears eminently reasonable and, for- 
tunately, supported by the experimental findings 

The somewhat inconsistent fluctuations in rater 
bias noted in the Results section of this paper 
preclude any sweeping generalizations. Rater 
bias, in this investigation, did not appear to be 
a consistent function of either scale length or of 
stimuli heterogeneity 

In the Procedure section it was noted that List 
1 contained stimuli drawn from the entire range 
of the stimuli used previously and best dupli 
cated the stimuli variance of the original list 
This may be an explanation for the slightly 
larger rater bias found for List 2. Thorndike 
(6, pp. 229-230) has pointed out that, when the 
responses to test items (food stimuli) are highly 
correlated (as they usually are with ratings) 
using items selected from a large range of item 
difficulty level (food preference) will encourage 
subject discrimination. Since Lists 1 and 3 con- 
tained stimuli only from the center or from the 
ends of the preference continuum the obtained 
drop in bias for these stimuli lists could be ex 
nected. However, since this explanation is post 
hoc and based upon inconsistent evidence it is 
somewhat unconvincing. 

Two cautions must be emphasized. The re 
sults of this and the previous two studies (1, 2) 
can only be tentatively generalized to the rating 
situation where Ss are requested to report on 
their own feelings, preferences, prejudices, et 
Also, the results can be applied only to ratings 
by relatively naive raters as represented by col 
lege students. We cannot hope that our findings 
will be confirmed without modification when 
scales are used to rate more objective stimuli 
or are used by more experienced raters 


Summary 


lists of 10 food stimuli were se 
lected so that the lists varied in the hetero- 
geneity of the stimuli. Preference ratings 
were collected from 278 Ss using rating scales 
with 2, 3, 4, or 5 categories. Rating reli 
ability was highest with the most heteroge- 
neous list and with the 4-category scale and 
was lowest with t! 
and the 2-category scale. 


Three 


most homogeneous list 


Rater bias results 
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were more tentative, with the list of inter- 
mediate stimuli heterogeneity and the 4-cate- 
gory scale most subject to systematic rater 
error on the part of the Ss. 


Received July 29, 1953. 
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Scholastic Achievement of Extension and Regular College Students 


Alexis M. Anikeeff 
Oklahoma A & M College 


Should students receive regular college 
credit for work completed under an off-cam- 
pus extension program? In order to permit 
a more objective answer to the question, a 
study was initiated to evaluate the scholastic 
achievement of extension students. 


Procedure 


Identical personnel management examina- 
tions were administered to approximately 39 
male, evening extension students and to a 
similar number of male, regularly enrolled 
students, before and after exposure to course 
subject matter. The same procedure was fol- 
lowed twice with each group. On the first 
occasion, the examination covered six chap- 
ters, part 5, of a standard personnel manage- 
ment textbook (3), and lecture material. On 
the second occasion, the examination again 
covered six chapters, part 7, and lecture ma- 
terial. Each examination contained 50 four- 
choice test items. In addition, each exami- 
nation had been twice refined by rejection of 
items which failed to meet established in- 
ternal consistency standards. About 75% of 
the test items on each examination covered 
textbook information, while 25% of the items 
tested knowledge of lecture material. The 
same individual delivered identical lectures to 
both groups of students, and also adminis- 
tered the examinations. 

Although the course in personnel manage- 
ment is categorized as a junior level course, 
and the membership is predominantly com- 
posed of junior level students, approximately 
one-fifth of the students were seniors and one- 
tenth of the students were classified as sopho- 
mores. The course was a graduation require- 
ment for five of the 39 students. On the 
other hand, 34 students selected the course 
as an elective. Not more than one-quarter of 
the regularly enrolled day students were vet- 
erans of World War IT. 

The evening extension class was composed 
entirely of World War II veterans. Each 
veteran received thirty dollars per month for 
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attending and being enrolled in the course. 
All members were enrolled under one of the 
following two provisions: 1. College degree 
program, or 2. Two-year college certificate 
program. The distinction between class mem- 
bers classified under the two programs was 
almost completely ambiguous. Selection of 
either program rested solely with the student. 
Extension students shifted from one program 
to the other program indiscriminately, and 
with considerable vacillation. 

Both groups of students were matched ac- 
cording to initial performance on each of the 
tests. Standard errors of the means were 
corrected for matching on an infallible cri- 
terion in the case of a “before” and “after” 
comparison of the same group on the same 
test. All other standard errors of the means 
were corrected for matching on a fallible cri- 
terion; namely, initial performance on the 
test. Formulas for obtaining the corrected 
standard errors of the means were available 
in Guilford’s publication (1, p. 196). The 
corrected standard errors of the means were 
employed in formulas used to establish the 
significance of the difference between arith- 
metic means of correlated data. 

Standard errors of standard deviations were 
corrected for matching on an infallible cri- 
terion or on a fallible criterion as the data 
dictated. Peters and Van Voorhis (2, p. 143) 
supplied the formulas for this computation. 
The corrected standard errors of standard 
deviations were used in formulas for deriving 
the significance of the difference between 
standard deviations of correlated data. 

For the purpose of securing a value to be 
used in the formulas for determining the sig- 
nificance of the difference between arithmetic 
means and between standard deviations of 
correlated data, coefficient of correlation 
was obtained for each of twelve comparisons. 
When the number of cases in one distribu- 
tion exceeded the number of cases in the dis 
tribution with which the first distribution was 
compared, the superfluous unmatched cases 
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were dropped, as suggested by Peters and Van 
Voorhis (2, p. 449). Approximately 1214,% 
of the cases were dropped for one compari- 
son, 10°% of the cases were dropped for two 
comparisons, 214°% of the cases were dropped 
for four comparisons, and none were dropped 
for the remaining five comparisons. In all 
cases, the Pearsonian product-moment coeffi- 
cient of correlation was employed. 

Six comparisons of data were made for 
each of the two tests which were adminis- 
tered: 1. Extension students before studying 
vs. extension students after studying, 2. Day 
students before studying vs. day students 
after studying, 3. Extension students before 
studying vs. day students before studying, 4. 
Extension students before studying vs. day 
students after studying, 5. Extension students 
after studying vs. day students before study- 
ing, 6. Extension students after studying vs. 
day students after studying. For the pur- 
pose of this investigation, both presumed and 
actual studying are considered to be studying. 


Results 


When considering the data found in Table 
1, it is well to recall that the results are 
based upon tests which have been refined 
twice. In addition, it is noteworthy that 
each test contains 50 four-distracter ques- 
tions. Under these circumstances, an arith- 
metic mean of 12.50 correct answers can be 
obtained by random guessing. A_ further 
analysis of random guessing indicates that 
an arithmetic mean of 18.50 must be ob- 
tained in order that random guessing can be 
discounted at the 5% level of confidence. 
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Moreover, a mean of 20.39 must be obtained 
to reach the 1% level of confidence. 

Evidence that factors other than random 
guessing contribute to a naive student’s test 
score is documented by published research 
and the author’s analysis of test responses. 
As a consequence it is reasonable to assume 
that random guessing represents a very con- 
servative criterion of scores which could be 
obtained by students who were unexposed to 
course subject matter. In view of this situa- 
tion, the fact that the mean of extension stu- 
dents after’ studying does not differ signifi- 
cantly from the random guessing mean on 
the first experimental test, and fails to reach 
the 1°% level on the second experimental test, 
is worthy of consideration by extension pro- 
gram administrators. 

The significance of differences was derived 
for testing the reliability of the differences be- 
tween arithmetic means, between standard 
deviations, and between the obtained coeffi- 
cients of correlation and true coefficients of 
correlation assumed to be zero. <A table illus- 
trating specific details of the foregoing analy- 
sis was omitted to reduce publication costs. 
However, the results indicate that the differ- 
ences between SD’s and AM’s of the exten- 
sion student distribution after studying are 
not significantly different from those obtained 
from the regularly enrolled students before 
studying for the first experimental examina- 
tion. 

On the second experimental examination, 
the AM of extension students after studying 
does not differ significantly from the AM of 
regularly enrolled students before studying, 


Table 1 


Performance of Extension and Day Students Under Varying Conditions of 
Test Administration 


Number 
Cases 


Sequence hxam Day 


Pretest 39 
Posttest 39 
Pretest 2 40) 
Posttest 39 


* Day students vs. extension students 


Standard 
Deviation 


Arithmetic 
Mean 


Day Ext 
1 144 3.8 7 
2 17.8 = t 
1 16.3 3 
2 19.9 : 3 


Day Ext 
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as was true for the first examination. How- 
ever, the difference between SD’s is signifi- 
cant at the 5‘7 level of confidence for the 
same comparison on the second examination. 
In addition, the estimated r-is not. signifi- 
cantly non-zero for the same comparison on 
the first examination, although the estimated 
r on the second experimental examination 
does differ from zero at the 5°% level of con- 
fidence. 

Both extension and regularly enrolled stu- 
dents obtained higher scores after studying 
than they did before studying, on each of 
the administered examinations. However, the 


regularly enrolled students achieved higher 
scores than extension students, both 
studying and after studying. 


before 


Discussion 

If extension students are to receive college 
credit for courses offered in an off-campus 
program, it is reasonable to expect that the 
achievement of extension students should be 
comparable to the achievement of regularly 
enrolled students. .Practical considerations 
frequently becloud the issue. Day students 
ostensibly exert their majo: effort toward the 
acquisition of knowledge prescribed by school 
administrators. Conversely, extension stu- 
dents in evening classes exert their major 
effort toward earning a livelihood. In the 
actual situation a considerable overlapping 
of goals may occur. Nevertheless, differences 
in goal orientation could account for differ- 
ences in motivation, and in this manner be 
reflected in differences of achievement be- 
tween extension students and regularly en- 
rolled students. 

Physiological and psychological types of 
fatigue probably exert their insidious influ- 
ences on both the instructor and the students. 
The instructor frequently drives many miles, 
bolts down unpalatable food in unfamiliar sur- 
roundings, and talks for three hours about 
subject matter previously discussed during 
the day to heavy-lidded students who eagerly 
await class dismissal and reunion with their 
families. 

Differences in educational backgrounds be- 
tween extension and regularly enrolled stu- 
dents, as well as other factors, may also con- 


173 


tribute to the difference in achievement scores. 
However, despite the reasons for the differ- 
ences between the achievement of extension 
and regularly enrolled students, if further 
studies support results found in this investi- 
gation, the case for granting college credit for 
extension work will be severely challenged. 
If the administrators persisted in granting 
college credit under these circumstances, the 
administrators would be honor bound to give 
college credit to the regularly enrolled college 
students solely on the basis of payment of 
registration Under these conditions, 
knowledge of lecture and textbook material 
would be optional. 


fees. 


Summary 


Identical pretest and posttest examinations 
were twice administered to a group of exten- 
sion students and to a group of regularly en- 
rolled college students. Six comparisons of 
educational achievement between groups were 
obtained for each of two examinations. 

1. The arithmetic means of regularly en- 
rolled students on pretests did not differ sig- 
nificantly from the posttest arithmetic means 
of extension students on identical examina- 
tions. 

2. The posttest mean of extension students 
on the first examination did not differ signifi- 
cantly from a mean which could be obtained 
by random guessing. On the second examina- 
tion, the posttest mean of extension students 
differed at the 5‘% level of confidence from 
the mean which could be obtained by random 
guessing. 

3. In view of the obtained results, a ques- 
tion was raised about the advisability of 
granting college credit for work performed in 
evening off-campus extension courses. 
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Index of Collaboration for Test Administrators 


Alexis M. Anikeeff 
Oklahoma A & M College 


Freedom to secure appropriate information 
from fellow test-takers during the administra- 
tion of an examination may be considered an 
inalienable right by some individuals. Others 
may denounce this procedure as a scourge 
upon the American educational system. Per- 
haps both groups will agree that from an un- 
moral, and a solely objective viewpoint, such 
a practice is undesirable because it lowers the 
reliability and validity of the testing process. 

Proctoring examinations, distributing sev- 
eral forms of an examination, rearranging the 
same set of questions, seating test-takers at 
maximum distances from each other, and 
haranguing test-takers that virtue will tri- 
umph are methods which have been used 
with varying degrees of success. Concomit- 
ant with the foregoing procedures, would it 
be possible to develop some method or tech- 
nique which could indicate the presence of 
collaboration on multiple-choice tests even 
though the act of collaboration went un- 
noticed by the test administrator or the 
proctor? ‘The purpose of this study was to 
develop and test the usefulness of such a 
technique. 

A scrutiny of examination papers submitted 
by two individuals who were obviously col- 
laborating with each other during the ad- 
ministration of an examination suggested the 
feasibility of comparing the distracters se- 
lected by each individual for his incorrectly 
answered questions. Under suimewhat simi- 
lar circumstances, Bird (1) found that a 
comparison of incorrectly answered multiple- 
choice and completion questions. offered defi- 
nite possibilities of detecting collaboration. 
Mor the purpose of the preseni study, col- 
laboration is defined as any voluntary or in- 
voluntary dissemination of information on 
the part of one test-taker for the purpose of 
improving the test score of another test-taker 
during the administration of an examination. 

Documented knowledge indicates that ran- 
dom guessing would permit one question out 
of four questions to be answered correctly 
when four optional answers are presented for 


each question and one of the four answers is 
always correct. Random guessing by defini- 
tion implies a complete absence of knowledge 
about the subject matter tested. Therefore, 
if four wrong answers are substituted for four 
correct answers, random guessing would nev- 
ertheless permit one of the wrong answers to 
be selected by chance alone. However, to the 
extent that more than one of the arbitrarily 
selected wrong answers is chosen under these 
circumstances, something other than random 
guessing may be operating. 

The index of collaboration assumes that 
within specific levels of confidence it is pos- 
sible to detect collaboration between test- 
takers. A comparison is made of distracters 
selected for incorrectly answered questions by 
two or more individuals. Random guessing, 
as previously indicated, would permit one- 
fourth of the total number of incorrectly an- 
swered four-choice questions selected by one 
test-taker to be answered with identical dis- 
tracters by an adjacently seated individual. 
A simple illustration of the foregoing situa- 
tion could be portrayed by two individuals 
who managed to answer 20 identical ques- 
tions incorrectly on a 50-question examina- 
tion which employed four-choice questions 
and one of the optional answers was always 
correct. It is reasonable to believe that five 
of the 20 identical questions could be an- 
swered incorrectly by using identical distrac- 
ters. The number of identical incorrect an- 
swers above five needed to be shared by both 
test-takers before collaboration is indicated 
can be determined by the use of a simple 
formula (2) for the standard error of the 
frequency, \/Npq, or read from Table 1 
which is based on this formula. Symbol NV 
in the formula refers to the number of ques- 
tions answered incorrectly under these cir- 
cumstances. 

Procedure 

An effective measure of the collaboration in- 
dex’s validity is an admission of guilt by indi- 
viduals who have been identified as collaborators 
by the index. At least ten cases, involving 
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Table 1 


Index of Collaboration for Use with 
Four-Choice Questions 


Number of Identical Questions 
Wrong on Examination Paper B 
Using Same Distracters as Found 
on Examination Paper A Needed 
to Establish Existence of 
Collaboration at Various 
Levels of Confidence 


Number of 
Questions 
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Examination 
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twenty individuals, have been validated by direct 


admission, indirect admission; e.g., “I won’t deny 
it, but I certainly won’t admit it,” or by objec- 
tive observation in isolated instances when an 
individual was found flagrantly copying answers 
from his neighbor and was permitted to continue 
in this manner for the duration of the examina- 


tion. Under these circumstances it would appear 
that the index of collaboration has been success- 
ful in identifying every known case of collabora- 
tion within the past two years. Unfortunately, 
little was known about the success of the col- 
laboration index in the identification of the un- 
known cases of collaboration. 

In order to further test the effectiveness of the 
index of collaboration, a group of 17 regularly 
enrolled college students were asked to collabo- 
rate with each other during a second administra- 
tion of a personnel management classroom ex- 
amination which contained 50 four-choice ques- 
tions. As a result of lengthened summer session 
classroom periods, students were asked to return 
to the classroom 45 minutes after the beginning 
of the first examination for the purpose of hear- 
ing an important announcement. When the stu- 
dents reconvened, they were informed that they 
would receive an A-grade weighted equal to that 
of one regular examination if they would retake 
their previous examination under collaboration 
conditions. The students were told that they 
must collaborate with one or with several stu- 
dents in order to receive their reward. The stu- 
dents were further informed that they were in a 
simulated regular examination situation, and con- 
sequently, any detectable case of collaboration 
would be discouraged by the instructor. 

Students kept a record of the number of an- 
swers which they obtained from each student 
with whom they collaborated. This information 
was retained by each student until the collabora- 
tion analysis was completed in order to insure 
greater objectivity of the analysis 

Results 

Detailed results are available in Table 2 
where the number of identical wrong dis- 
tracters are indicated as being shared by each 
student paired with every other student. Of 
the 17 students participating, collaboration 
could not be uncovered for seven, collabora- 
tion was found operating on the 5% level for 
two students, and on the .001°% level for 
eight students. 

A comparison of collaboration index analy- 
sis with the data of extent of collaboration 
kept by each student, Table 3, reveals that, 
in the experimental situation, the index of 
collaboration failed most significantly in the 
identification of student L who secured as 
many as twenty answers by copying from 
three other students. On the other hand, the 
index was able to identify two students, G 
and QO, at about the .001% level of confidence 
when both cooperated closely with each other 
and secured only five answers from each other. 
Other cases appear to substantiate the belief 





176 


that the index is most discriminating in the 
identification of one-way collaboration when 
an individual copies a sizable number of an- 
swers from a single test paper. A substan- 
tially smaller number of answers apparently 
need to be shared when active two-way col- 
laboration is involved. Moreover, the col- 
laboration index appears unable to identify 
with any particularly useful degree of ac- 
curacy, the individual who copies answers 
from several individuals when the individuals 
in question fail to reciprocate his behavior. 
In addition, the 5% level of confidence was 
found much too crude for accurate identifi- 
cation of collaboration. 


Discussion 


The collaboration index is premised upon the 
operation of random guessing. Consequently, 
the efiectiveness of the index will vary directly 
with the degree that random guessing is operat- 
ing. Since the classroom examination adminis- 
tered to the experimental group was refined 
twice, and the distracters lacking pulling-power 
were eliminated, it is reasonable to believe that 
random guessing was present to a greater extent 
in the experimental situation than it would have 
been if a non-refined examination were used. 

Despite the refinement procedure, it would 
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nevertheless be safe to assume that all distracters 
do not have equal attraction values. An analy- 
sis of distracter effectiveness made by tallying 
the number of times a distracter is selected 
could suggest whether a modification or an ad- 
justment is needed in order to secure a more 
accurate indication of collaboration. For ex- 
ample, if on a four-choice question two distrac- 
ters are never chosen, and one of the remaining 
two answers is the correct one, then by defini- 
tion an individual has only one chance of making 
an error. To the extent that a considerable 
variation is found in the number of effective 
distracters among the questions on the whole ex- 
amination, the application of the principle of 
binomial expansion may prove more useful than 
the standardized index of collaboration. 

Although the index of collaboration used in 
this study, Table 1, is developed for use with 
four-choice questions, a similar index could be 
developed for examinations using any other num- 
ber of distracters. Moreover, in the event that 
a test administrator of a four-choice examination 
is unaware of the effectiveness of his distracters, 
and if an analysis of distracters is for some rea- 
son impractical at the moment, he may feel more 
secure in using an index based upon three-choice 
questions. Under these conditions he would as- 
sume that only three of the four optional an- 
swers are effectively operating in terms of at- 
traction for any four-choice question. 

The usefulness of the index of collaboration is 
not limited solely to identification of collabora- 


Table 2 


Paired Comparison of the Number of Identical Wrong Distracters Shared by Experimental 
Group Members During Collaboration Examination 


Student Code 1 > =~ -£ 
No. Wrong 
on Exam 


* Probability of collaboration 19:1. 
t Probability of collaboration 9,999: 1. 
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Index of Collaboration for Test Administrators 


Table 3 


Number of Answers Indicated by Students as Being Copied from Other Class Members 
During Collaboration Examination 








cc rR @ 
No. Wrong 
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Note: Student in vertical column copies from student in horizontal column. 
* Erroneously identified as collaborating at 5% level, one-way collaboration. 
t Correctly identified as being involved in collaboration at .001% level. 


t Involved in collaboration through third individual. 


§ Presumed victims of 5% level erroneously identified collaborators. 


tion. Allaying the suspicions of collaboration 
may under seme circumstances prove more heart- 
ening than the establishment of the act of col- 
laboration. For example, did the student who 
failed three regular quizzes earn an excellent 
grade on his final examination as a result of in- 
creased motivation and preparation or through 
collaboration? Available evidence supports the 
contention that the use of the index of collabora- 
tion has motivated many test-takers to become 
more wary of their fellow classmates during the 
administration of classroom examinations. 

In view of the available data it is reasonable 
to believe that the index of collaboration is able 
to identify cases of large scale collaboration 
which could not otherwise be identified, owing to 
the polished skill of the collaborators. On the 
other hand, the index of collaboration is rela- 
tively ineffective in identifying the individual 
who secures information sporadically and by 
means of furtive glances at numerous papers 
which surround him during his examination pe- 
riod. 

Summary 


An index for the identification of collabora- 
tion between test-takers during the adminis- 
tration of an examination was developed, and 
its effectiveness tested on an experimental 


group. 


‘1. The index of collaboration was found 
reasonably effective in the identification of 
collaboration despite the inability of the ad- 
ministrator to detect its existence during the 
administration of the examination. 

2. The index of collaboration was most ef- 
fective in the identification of large scale one- 
way collaboration involving the copying of at 
least 16% of the answers from a single ad- 
jacent test-taker. 

3. Two-way active collaboration was identi- 
fied when only 10% of the answers were 
shared by two individuals. 

4. Identification of collaboration was least 
effective when an individual copied answers 
from several other test-takers in a sporadic 
and unsystematic manner. 
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Group Manual Dexterity in Women * 


Andrew L. Comrey and Gerald Deskin 
The University of California at Los Angeles 


Two previous articles (1, 2) described the 
results of experiments with a group manual 
dexterity task in men. The present experi- 
ment is a duplicate of the second of the previ- 
ous experiments using women college stu- 
dents as subjects instead of men. Although 
a complete description of the experimental 
procedures can be found in the previous re- 
ports, a brief summary will be given here. 


Procedure 

Sixty pairs of volunteer women university 
students were given six trials on a modifica- 
tion of the Purdue Pegboard Assembly Task. 
Instead of making each successive assembly 
by starting with the preferred hand, the sub- 
ject was required to alternate the hand used 
to place the first element of each successive 
assembly. Thus, instead of using the stand- 
ard instructions for the Purdue Pegboard 
Assembly Task, the subjects were instructed 
to begin each assembly after the first one 
with the same hand used to place the final 
washer on the preceding assembly. On this 
individual task, each person of the pair 
worked on her own pegboard. The two peg- 
boards were placed end to end, although the 
girls could not watch each other since a 
screen was placed between them. 

After the six individual trials, the screen 
and one of the boards were removed from 
the table. The other pegboard was placed 
lengthwise between the two girls. Six more 
trials were taken in which the girls worked 
together on the assemblies instead of work- 
ing individually. The first subject, for ex- 
ample, would place a peg in the first hole on 
her side of the board, after which the second 
subject would add a washer and the first sub- 
ject would follow with a collar, and finally 
the second subject would complete the as- 
sembly with another washer. Instead of re- 
peating this operation, however, the subject 
who finished the first assembly would begin 
the next assembly by placing a peg in the 


* This research was supported by a grant from the 
University of California. 
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second hole on her side of the board. The 
first subject would then place on the first 
washer, and so on. Thus, the assemblies 
formed a zigzag pattern down the board, 
first one subject beginning the assembly and 
then the other, functions alternating each 
time. In terms of the kind of set required, 
this process appears to be similar to the indi- 
vidual assembly task in which the subjects 
were required to alternate hands in starting 
successive assemblies. 

On the basis of the sum of scores for the 
last four trials on the individual task, the 
members of pairs were divided into two cate- 
gories, “high” and “low.”  Reliabilities for 
“high,” “low,” and “group” scores were de- 
termined by correlating scores on trials three 
and five with scores on trials four and six 
and correcting by the Spearman-Brown for- 
mula. “Difference” scores were also ob- 
tained by subtracting the “low” total score 
of each pair from the “high” total score. 
All intercorrelations were computed between 
“high,” “low,” “difference,” aad “group” 
scores. The correlations of the “high” and 
“low” scores: with the “group” scores were 
corrected for attenuation in both variables 
and beta weights computed for predicting 
“group” scores from “high,” “low,” and “dif- 
ference” scores: The multiple correlation 
coefficient was also computed. 


1 In the two previous experiments, the “difference” 
scores were not included in the analysis. It was de- 
cided to add them as a factor of possible influence 
and to give a check on the whole process of obtain- 
ing beta weights. Since the “difference” scores are 
completely determined by the “high” and “low” 
scores, the whole system is overdetermined for com- 
puting beta weights so that only an approximation 
to a solution is usually obtained. The errors in re- 
producing original correlations from the partial re- 
gression equations will be small if the correlations 
are internally consistent. These errors are recorded 
under the “e” column in Table 1. The “high-low” 
correlation was not corrected for attenuation here, 
as in the previous analysis, because this correlation 
should not increase as the proportion of error vari- 
ance decreases. Data from the previous experiments 
have been reworked so that Table 1 gives the proper 
figures for all three studies based on the method of 
analysis decided upon for the present experiment. 
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In the two previous experiments, as well as 
this one, the main objective has been to de- 
termine the extent to which performance of 
persons on a group task could be predicted 
from a knowledge of how well they could do 
individually on a very similar kind of task. 
Corrections for attenuation were used to gain 
some knowledge of the theoretical upper 
limit of this predictability. 


Results 


The results of this experiment and the two 
previous experiments with men are summa- 
rized in Table 1. The results for the men 
are given in rows marked “I” and “II,” re- 
spectively, whereas the figures for the women 
in the present experiment are given in the 
rows marked “III.” In the first column of 
Table 1 are listed the total score categories, 
“high,” “low,” “difference,’ and “group”; 
means and standard deviations for these sets 
of scores are given in the second and third 
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columns, respectively. The reliabilities are 
given in column four, with the intercorrela- 
tions of score variables appearing in the next 
four columns. Beta weights are given in the 
next column, and in the last column are given 
the differences between experimental correla 
tions with “group” scores and those repro- 
duced by inserting beta weights and correla- 
tions in the partial regression equations (see 
footnote 1). 

The pattern of results for university women 
is much the same as that for university men 
except that the over-all multiple correlation 
is only .62 as compared with .66 and .70 in 
the first two experiments, respectively. The 
differences are not significant, however. 

The most important fact which emerges 
from these three experiments is that a sur- 
prisingly small proportion of the total vari- 
ance on a group-performance task can be 
predicted on the basis of how well the team 
members can perform individually on an ap- 


Table 1 


Summary of Three Experiments 


Note: Experiments I, I, and III were with 65 pairs of male graduate and undergraduate psychology students, 
47 pairs of male undergraduate psychology students, and 60 pairs of female undergraduate psychology students, 


respectively. 


The numbers under the “e” column are the discrepancies between obtained correlations with group 


scores and those correlations which result from placing the best beta weights in the partial regression equations 
In the three experiments, the multiple correlations were .66, .70, and .62, respectively, with the squares equal to 


44, .49, and .38. 


and “low” reliability estimates. 


Score Mean S.D. 


High 192 16.5 
156 18.0 
160 13.3 91 


173 16.8 92 
137 17.2 94 
142 13.0 


Low 


18.9 17.4 1 
21.2 19.1 : 
18.4 12.9 85 


19.2 87 
19.2 47 
18.8 86 


Group— I 178 
II 186 
Ill 190 


The rj; values for the difference scores were computed as the geometric mean of the “high” 


Correlation with 
Low Diff 


1.00 A8 48 50* 
1.00 16 51 47* 
1.00 54 59 42* 


Group 


High 


48 56 53* 
46 49 53” 
54 42 48* 


1.00 06 
1.00 03 
1.00 OO 


4S 56 
Jl A9 
59 42 


50* 53* —.06 
47* 53* —.03 
42* 48* 00 


1.00 
1.00 
1.00 


* These correlation coefficients were corrected for attenuation in both variables before they were used 


obtain the beta weights. 
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parently similar kind of task. In none of the 
three experiments did that proportion reach 
one half. These figures are based on cor- 
relations corrected for attenuation, and, as 
such, represent what could be done with error- 
less measures. Practically, the possibilities 
for prediction are even less impressive. 
These results suggest that there are other 
important behavior variables to be measured 
which will help to determine how well a per- 
son will perform in cooperative kinds of tasks. 
Evidently a careful analysis of the physical 
operations he performs in the group task, fol- 
lowed by measurement of these job elements 
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by means of individual tests, leaves out im- 
portant sources of variance. Just what the 
nature of these other sources of variance may 
be is not immediately evident. Further re- 


search will be needed to explore this problem. 


Received August 3, 1953. 
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A Short Method of Factor Analysis ' 
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and 


Andrew L. Comrey 


The University of California at Los Angeles 


This article will point out the need for a 
short method of factor analysis in certain 
kinds of situations, describe briefly such a 
method, and finally present an empirical com- 
parison of the short method with the com- 
plete centroid method on a particular ex- 
ample. 

Occasions frequently arise in psychological 
research where several variables have been 
developed over a period of time to assess cer- 
tain characteristics of individuals, groups, and 
situations. The variables may be scores on 
individual items, groups of homogeneous 
items, or other types of data.. At some point, 
revision may be needed to yield a smaller 
number of variables which will cover the 
domain with equal effectiveness and greater 
economy. This economy may be achieved 
through the elimination of variables which 
are measuring only those functions already 
assessed elsewhere, and through the retention 
of relatively uncorrelated measures. 

Where any considerable number of vari- 
ables is involved, factor analysis constitutes 
a useful technique for providing the informa- 
tion upon which such a program may be un- 
dertaken. Unfortunately, the most generally 
used factor-analysis procedures are so time 
consuming that they are often by-passed in 
situations where they could be helpful. To 
factor analyze the items of a test, for ex- 
ample, using the complete centroid method 
becomes quite an expensive and _ time-con- 


1This research was carried out under Contract 
N6—-ONR-23815 between the University of Southern 
California and the Office of Naval Research. The 
research was supervised by R. C. Wilson. The modi- 
fied diagonal method of factor analysis herein dis- 
cussed was conceived by A. L. Comrey. The opin- 
ions expressed are our own and are not necessarily 
shared by the Office of Naval Research. J. M. 
Pfiffner, J. P. Guilford, and H. J. Locke are the co- 
responsible investigators. 


suming assignment if the test contains as 
many as thirty or forty items.” Or, if we 
are dealing with about the same number of 
questionnaire variables, each variable being 
assessed by means of a pool of homogeneous 
items, a similar situation prevails. Even 
though a factor analysis might be valuable, 
then, it may not seem feasible to take on the 
amount of labor required in the methods 
traditionally employed. 

Thurstone (1) has described a diagonal 
method of analysis which is considerably less 
laborious than a complete centroid solution, 
especially when the problem of rotation is 
taken into account. Thurstone feels that the 
diagonal method, while theoretically correct, 
is inadequate as a method of analysis because, 
generally, the communalities which are in- 
serted in the diagonals cannot be very ac- 
curately estimated. Since the mechanics of 
the method place a rather heavy dependence 
upon the diagonal cell values, inaccuracies in 
these values may produce considerable dis- 
tortion in the final result. 

Although it is difficult to obtain accurate 
estimates of the communalities, it is fre- 
quently possible to compute good reliability 
estimates for the tests. Since test reliabilities 
can be estimated more accurately than test 
communalities, the accuracy of a diagonal 
method of analysis can be improved by sub- 
stituting reliabilities for communalities in the 
diagonals. This changes the nature of the 
problem to some extent, however. Instead of 
analyzing only the common factor variance, 
or some estimate of the extent of that vari- 


2 An iterative method has been suggested (3) for 
factor analyzing test items. The method described 
in this paper could be employed effectively in the 
preliminary stages of such an analysis for the pur- 
pose of shortening the iterative process. 
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ance, the total true variance is used, includ- 
ing what would ordinarily be assigned to spe- 
cific factors if communality estimates were 
used. 

Many factor analysts would object to ana- 
lyzing the total test space in that they are 
looking for factors representing underlying 
variables which “explain” the common ele- 
ments among the variables concerned. If 
this is the objective, of course, it would be 
unsatisfactory to use the diagonal method 
modified by using reliabilities in the diagon- 
als instead of guessed communalities. On the 
other hand, careful analysis of the purposes 
in factoring may reveal that the objective is 
not at all that of discovering some latent ex- 
planatory structure. There may be, for ex- 
ample, several well-developed scales which 
need to be refined and extended, but which 
cannot be discarded just because a factor 
analysis solution suggests that the “best” 
variables are somewhere in between the ones 
actually in use.* 

The problem under such circumstances is 
more nearly one of imposing a relatively arbi- 
trary structure upon the domain rather than 
attempting to develop a new set of variables 
of special intrinsic explanatory value. When 
this is the case, there is no good reason why 
the first factor should not be aligned with the 
best developed and most useful variable. The 
next factor can go through or near a variable 
of established value which is approximately 
orthogonal to the first one. 

In this way, the factors can be made to 
coincide in so far as possible with variables 
which are already serving adequately at the 
time. Other variables fall where they may, 
and are eliminated as they fail to add any- 
thing to the structure needed to lay out the 
domain under investigation. As a result of 
such an analysis, some variables may need to 
be refined in certain directions to make them 
more independent of one another. Other 
variables can safely be dropped altogether in 


5 Thurstone (2) reports a study of Guilford’s 
temperament schedules, in which he wished to know 
how many factors were represented in the 13 scores, 
rather than in ascertaining their common factors. 
For this purpose he used reliability estimates in the 
diagonal cells and factored the matrix by the cen- 
troid method. 
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that they are not adding anything new and 
the factorial picture will very likely suggest 
new areas in which further variables may 
profitably be developed. 

The considerations just presented led the 
authors to attempt an empirical check of the 
diagonal method using reliabilities in order to 
gain some information concerning how it 
might compare with the complete centroid 
method. The results of that empirical check 
will be presented after a brief description of 
the mechanics of the method. 


The Method 


Thurstone (1) has described the diagonal 
method fully, but a brief repetition of the 
essential steps may be helpful: 


1. Compute the correlation matrix as usual, 
inserting test reliabilities in the diagonal cells. 
This is at variance with the diagonal method 
described by Thurstone (1) in that guessed 
communalities would conventionally be used. 

2. Take the square root of the reliability 
which is largest and divide every correlation 
in the corresponding column by this number. 
The resulting quotients are factor loadings 
for Factor I. 

3. The variance due to Factor I is re- 
moved from the correlation matrix by ob- 
taining the matrix of inner products of the 
first factor loadings and subtracting them 
from the original correlation matrix, includ- 
ing the diagonal values. Thus, if tests 2 and 
3 had loadings of .6 and .7 in Factor I, .6 x 
.7 = .42 would be subtracted from the origi- 
nal correlation between tests 2 and 3. The 
result of this operation would be entered in 
the matrix of first factor residuals. 

4. The entire process is repeated, each time 
taking the column with the highest remaining 
diagonal value, until the unextracted variance 
is presumed to be largely error. This will 
usually be evident when the square roots of 
diagonal entries begin to get small in relation 
to residual column entries, resulting in obvi- 
ously inflated factor loadings. The exact 
point at which factor extraction should cease, 
however, must remain to a considerable ex- 
tent a matter of judgment. 

5. The matrix of factor loadings obtained 
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in this fashion is ready for such rotations as 
may be necessary to align the factors satis- 
factorily. Since the factors have been ex- 
tracted in a manner designed to make them 
similar in nature to certain existing variables, 
the number of rotations required to finish the 
solution generally will be rather small. 


Results of Two Analyses 


A previous article (4) reported the results 
of a complete centroid analysis of certain 
questionnaire variables used by the authors 
in the study of supervision and employee 
attitudes in a naval shipyard. A modified 
diagonal method of analysis was also applied 
to the same correlation matrix, using reli- 
abilities instead of guessed communalities, to 
determine the extent of the discrepancies 
which might occur between the results of the 
two different procedures. The previous ar- 
ticle treated the nature and significance of 
the centroid factorial results, so that we will 
be concerned here only with comparing the 
two solutions, rather than in presenting the 
findings of the factorial process. 

In the centroid analysis, nine factors were 
extracted. Seven factors were interpreted, 
leaving two residual factors. Of the seven 
interpreted factors, three were factors specific 
to doublet variables included in the analysis. 
In the modified diagonal analysis, seven fac- 
tors were extracted, two of which were dou- 
blets. The two solutions achieved excellent 
agreement on six factors, leaving each with 
a doublet factor not clearly present in the 
other analysis. Table 1 presents the com- 
parative findings for the two studies. Load- 
ings are listed only where the value in one or 
the other analysis was .40 or more. For in- 
formation on the nature of the variables in- 
volved, the reader is referred to the previous 
report (4). 


Discussion 


The total amount of labor expended in 
carrying out the modified diagonal analysis 
was approximately one-eighth of that re- 
quired to complete the full centroid analysis, 


including the rotations. Much of the saving 
is in the rotation process itself, since only a 
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minimum of readjustment of the axes is nec- 
essary in the modified diagonal case. In the 
present example, 14 graphical orthogonal ro- 
tations were required for the modified di- 
agonal analysis, and 79 such rotations were 
carried out with the centroid factors. 

The agreement between the two analyses, 
as evidenced by the data in Table 1, would 
suggest that the loss in accuracy is not great 
in comparison with the time saved, provided 
the over-all objectives of the analysis are con- 
sistent with a sacrifice of this kind. 

The 25 variables for this analysis were de- 
rived from 13 dimensions or item pools by di- 
viding each dimension into comparable halves 
or sub-dimensions on the basis of item con- 
tent. One sub-dimension was discarded be- 
cause of lack of item homogeneity. The cor- 
relations between comparable halves were 
used as the reliability estimates to be in- 
serted in the diagonal cells. For 15 of the 
variables this correlation between halves was 
their highest correlation. Since the highest 
correlation in the column was used as the 
communality estimate in the centroid analy- 
sis, 15 of the diagonal cell values were the 
same for both analyses. This favored a simi- 
lar outcome in both analyses. 

Inspection of Table 1 reveals that many of 
the differences in loadings between the two 
analyses may be attributed to discrepancies 
in the amount of variance extracted. These 
differences in extracted variance are revealed 
by comparison of the sums of squares of fac- 
tor loadings for each variable in the two 
analyses. Variables 1 and 2, for example, 
had low diagonal entries for the modified di- 
agonal analysis because the reliability esti- 
mate obtained by correlating variables 1 and 
2, which were supposedly comparable half- 
dimensions, was low. Evidently the low re- 
liability estimate was due, in some degree, to 
lack of comparability between halves; the 
communalities for the centroid analysis were 
higher. Had higher values been inserted in- 
stead of the reliability estimates actually 
used, the loadings for variables 1 and 2 would 
have been higher. It is expected that agree- 
ment of factor loadings using these two meth- 
ods will be greater for variables where the 
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Table 1 


Comparative Factor Loadings * 


IV, TVa 


IIy TIT, Wy 


Ve Va VI. VIu VIL. Vila h? hg? 





40 13 3.3 

42 27 45 3 
76 7 

74 76 

OO 66 
67 73 
64 

74 


1.05 
43 
46 
42 


&4 
86 
90 
79 


: 53 
68 69 52 
65 75 ‘ 70 


* Questionnaire variables are numbered down the left side of the table. Factors are numbered across the 


top, with solutions of the centroid (c) and diagonal (d) analyses placed side by side for each factor. 
loadings are shown only in those cases where the loading obtained by one of the two methods is .49 or more. 


ma! points have been omitted. 


amount of common factor variance approaches 
that of the true variance. 


Summary 


Occasions arise where it is desirable to ap- 
ply factor analytic techniques, but the ex- 
ploratory nature of the work and the time 
available may not justify a complete centroid 
analysis. A diagonal method, modified by 
using reliabilities instead of guessed com- 
munalities in the diagonal cells, is suggested 
as a satisfactory and economical substitute 
for the complete centroid analysis under cer- 
tain conditions. The results of an empirical 
comparison of this method with the complete 
centroid method on one correlation matrix 
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show that the two agreed fairly closely upon 
most of the factors obtained. 
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It is a common belief that many consum- 
er’s goods are identical, at least in the sense 
that they are indistinguishable once the wraps 
are removed. Recent studies by Pronko and 
his colleagues (3, 4) seem to have demon- 
strated this with respect to cola beverages; 
the case for cigarettes seems to have been 
settled the other way in one set of investiga- 
tions (1, 6) and positively in a recent study 
(5). However, many of the studies are not 
conclusive because of errors of procedure or 
analysis. 

The reasons for objection differ for the dif- 
ferent studies. They reduce, however, to a 
question of the type of judgment asked of 
Ss. For example, let us consider the studies 
of Pronko (3, 4). In these studies, various 
cola brands were administered to Ss who were 
asked to identify them by name. In two 
studies using popular varieties of cola, dis 
crimination was random, i.e., names were ap- 
plied on a chance basis. Consequently, in a 
third study three obscure brands were used. 
Not once were they properly identified; in- 
stead the names of the major brands were 
applied, also in a random fashion. The au- 
thors conclude that “the seven brands of 
Cola beverages employed in our series of 
studies appear to have the same stimulus 
function for our subjects and may be said to 
be ‘equivalent stimuli’” (3, p. 608). 

This conclusion seems unwarranted, for 
even if there were a discriminable difference 
among the different colas, Ss could hardly ex- 
press their awareness of this difference if 
they were unfamiliar with the names to be 
applied to them. It may be that the colas 
do indeed have equivalent stimulus values, 
but the use of an identification, as in this 
case, does not settle the case for this conclu- 


1The data were collected by Mr. Manning for a 
Master’s thesis at the University of Oregon. 


sion. If Ss apply names with better than 
chance accuracy, one may conclude that dis- 
criminable differences exist; but random re- 
sults do not justify the conclusion that such 
diffetences do not exist. For example, what 
would the results have been if Ss were asked 
to respond by “same” or “different”? There 
is no @ priori reason to expect comparative 
judgments to yield the same results as identi- 
fication judgments. 

This importance of the type of judgment 
used suggests that discriminatory ability in 
this sort of study is a function of test pro- 
cedures as well as test materials. The field 
of psychophysics provides ample evidence 
that this is the case. The questions raised 
here merely spell out the well-founded gen- 
eralizations concerning the relationships be- 
tween discrimination and procedures. 

In the present study, two specific questions 
were raised: (1) Is there evidence that indi- 
viduals can discriminate among different ciga- 
rette brands? (2) Are there differences in 
the patterns of judgment for two different 
kinds of discriminations, viz. recognition and 
affective? 

The idea of comparing affective and recog- 
nition judgments is based on the following 
consideration. While the ability to apply a 
name correctly to something requires some 
specific training, the ability to say whether 
one likes or dislikes something ordinarily does 
not. To be sure, likes and dislikes may be 
radically changed as a result of experience, 
as may the willingness to say one likes some- 
thing. Even the very nature of the qualita- 
tive experience may be altered as a result of 
such experiences. It seems, nevertheless, that 
at any given time almost any object can be 
responded to affectively, even in the absence 
of past contact with that object. 

It may be this very universality that per- 
185 
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mits affective experiences to play such a 
great role in behavior; on one’s first contact 
with some particular object, say a wine or 
tobacco, the only reaction he may have avail- 
able is an affective one. With this in mind, 
it was hypothesized that one of the things 
that determined preferred brands would be 
one’s affective response to them, and there- 
fore one’s usual brand would tend to be liked 
more than other brands. For any given sam- 
ple of objects, this does not exclude the pos- 
sibility that other objects may be liked even 
more, or that all of these objects may be 
equally liked. 

While this hypothesis might well have been 
wrong if any differential association between 
brands and affective judgment appeared, the 
use of affective judgments would still merit 
consideration as the possibility of a new way 
of approach to some common problems of 
discrimination. Indeed, the “method of im- 
pression” lies at the heart of psychophysics. 
However, the use to which it is put here is 
somewhat unusual since it is used as a “sensi- 
tivity” test rather than a simple preference 
test. Between the limiting cases of 100% 


“like” and 100% “dislike,” judgments of this 


sort might be used where it is difficult or im- 
possible to develop in Ss any other system of 
reporting their discriminations. 

Originally we had conceived of comparing 
the recognition and affective judgments for 
accuracy. This has proved possible only in 
the roughest way because of the logical prob- 
lems attending the definition of “accuracy” 
for the affective judgments. In a sense, affec- 
tive judgments cannot be veridical instru- 
ments; one may have an affective experience, 
but one does not have a correct affective ex- 
perience. But one can properly seek to cor- 
relate such experiences with known differ- 
ences like preferences or brands. So, when 
we speak of accuracy the reader may substi- 
tute “sensitivity” in the case of affective judg- 
ments without serious harm to the thread of 
the discussion. If this phase of the study is 
thought of as a methodological inquiry into 
different response procedures permitted Ss, 
then while accuracy may be asserted only of 
the recognition judgments, sensitivity may be 
asserted of the results of such judgments in 
terms of differential association with some 
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known differences, viz. brand preference and 
brand differences. In other words the meth- 
ods are being studied in terms of their sensi- 
tivity or accuracy, not the judgments them- 
selves. 

Technical problems prevented the study’s 
being carried out with colas so that one can- 
not say anything about cola discrimination 
on the basis of these results. But the logic 
of the argument should be generalizable to 
discrimination studies in any modality, and 
that is our main objective. 


Procedure 


Materials. Three brands of cigarettes were 
used: Camel, Chesterfield, and Lucky Strike. 
These were selected because a preliminary sur- 
vey indicated that they accounted for about 78 
per cent of the brands used by. a sample of 239 
students at the University of Oregon. Philip 
Morris ran a close fourth, but was not used be- 
cause of the difficulty in concealing identifying 
print. It will be seen that the method of in- 
vestigation is not seriously affected by this omis- 
sion. The students involved in the survey were 
not the same ones who participated in the main 
study. 

Each cigarette was banded with a gummed 
label applied in a tacky state. While this 
method, which was also used by Ramond, 
Rachal, and Marks (5), undoubtedly removes 
some cues usually used by smokers, e.g., “So 
round, so firm, so fully packed... .” it was 
felt that the procedure of blindfolding presented 
more serious difficulties. Each S, therefore, 
smoked a cigarette whose upper half was covered 
by a paper label, but whose lower half was com- 
pletely normal. 

Subjects. There were 288 Ss culled from in- 
troductory psychology classes. There were some- 
what more male than female Ss, though the data 
have not been analyzed by sex. They were re- 
cruited by having the following instructions read 
at the beginning of the class hour: 

“We are conducting an investigation in ciga- 
rette judgments, and would like your cooperation. 
Will all of you who are regular cigarette smokers 
raise your hands? .. . Will you file out indi- 
vidually by rows, so that when one returns, the 
next may leave, etc. The experiment, which is 
not unpleasant, should take about five minutes of 
your time. Thank you very much.” 

For reasons made clear under Routine, 246 of 
these Ss were utilized in the analysis. 

Routine. Ss were divided into two groups by 
alternating assignments as they entered the ex- 
perimental room. Each S was greeted in the 
following manner: 

“Come in. Just have a seat here, please. 
Now, I’m going to give you a cigarette and I'd 
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like you to put it in your mouth, and when you're 
ready, I'll light it for you. (£ hands cigarette 
to S$.) Take three or four puffs—more if you 
like—just as you normally would, 

(A) then tell me whether you like or dislike 
it; or 

(B) then tell me whether you think it is or is 
not the brand you usually smoke. 

“Ready? ... Here’s your light.” (£ lights 
cigarette for S, waits till S indicates judgment is 
made, then takes cigarette and records response. ) 
“Do you have a package of cigarettes with you? 
May I see it? Is this the brand you usually 
smoke?” (E records response.) ‘That's all and 
thank you very much. Since the success of this 
experiment depends in part upon people not 
knowing what to expect when they enter, I’d ap- 
preciate it if you didn’t discuss the procedure 
with your friends.” 

Those Ss assigned to the affective group were 
given the (A) portion of the instructions; the 
recognition group was read the (B) statement. 
In all other respects treatment was identical. 

There are four things especially to note about 
the instructions and the procedure. 

1. E did not know what brand of cigarette S 
smoked in the experiment until the latter had 
left the experimental room. The cigarettes had 
previously been masked and placed together. E 
arbitrarily selected one from a paper sack and 
gave it to S. 

2. Each S made only one judgment. Previous 
studies have erred seriously in this respect. The 
statistics available for analyzing situations of this 
sort invariably call for independence of measure- 
ments. The repetition of tests upon the same S 
is frequently unanalyzable because of the impos- 
sibility of computing any meaningful coefficient 
to express the relationship between their judg- 
ments. In addition, it seems unlikely that the 
usual methods of adapting S by having him wash 
out his mouth between drinks or tasting a mint 
between puffs are entirely satisfactory, unless one 
can be sure that there is no cumulative effect of 
the neutralizer. Finally, if the materials to be 
discriminated are in fact discriminable, it must 
be demonstrated in some independent way that 
the “trace-reducer” does not have a differential 
effect upon the trace. If such a differential in- 
teraction actually exists, none of the previously 
reported studies is designed so that its effect 
could be properly evaluated as a source of error 

3. Ss indicated their reaction by stating only 
whether they liked or disliked the cigarette or 
whether it was the brand they usually smoked. 
The reason for this technique is as follows. A 
like-dislike judgment is disjunctive; recognition 
judgment in the form of a naming response 
would not be. though it could be converted into 
one. In order to keep the response as uniform 
as possible, however, Ss were asked to make a 
disjunctive identification. It should be noted 
that they were not given any information con- 


187 


cerning the brands offered to them. For all they 
knew, they might have been getting ‘off brands.” 
The similarity of this portion of the procedure 
to that of Pronko and his colleagues is obvious. 
Not giving S a set of possibilities to draw on 
prevents the computation of a chance level of 
success; this will be more fully discussed in the 
Analysis. 

4. After S made his judgment he was requested 
to exhibit a pack of cigarettes, and indicate 
whether it was the brand he usually smoked 
As a result of this step, 42 Ss were eliminated 
from the analysis; either they did not smoke one 
of the three brands selected for study, or they 
had no cigarettes with them. While some regu- 
lar smokers were undoubtedly excluded by this 
tactic, the likelihood of getting the occasional 
smoker seems small. As a matter of experi- 
mental procedure, it seems fair only to include 
individuals who may be expected to show the 
maximal degree of discrimination, ie., the regu- 
lar smoker or drinker. Finally, the use of this 
device ensured complete obscurity of the test 
stimuli from the Ss; to have indicated prefer- 
ence for smokers of certain brands alone would 
have cancelled the precautions to make the rec- 
ognition judgment as comparable to the affective 
one as possible. 

Test Locus. For all but 15 Ss, the experiment 
was carried out in the same room. It was high- 
ceilinged, with a large window, a single ceiling 
fixture, two chairs and two tables. The room 
was thoroughly aired out before the next subject 
was admitted. All work was done during day- 
light hours. The remaining 15 Ss were studied 
in another room similarly furnished. Since they 
were distributed evenly (save for one) between 
the two groups, if there were any measurable ef- 
fect attributable to the difference between the 
rooms it could serve only to increase the vari- 
ance of the two distributions. It was decided 
therefore, to leave this possible source of vari- 
ance in the error term of the statistic 


Results 


In Table 1 the results for the recognition 
and like-dislike judgments are presented. In 
each cell, the top number represents the total 
number of “yes” or “like” responses, respec- 
tively, and the lower number, the total num- 


ber of observations in this category. Thus 
the bottom numbers represent the total of 
“yes” plus “no” responses, and of “like” plus 
“dislike” responses, respectively, in each cate- 
gory. As will be seen, the consigning of any 
observation to a particular category depends 
on three variables, viz. the brand of cigarette 
S regularly smoked, the brand he sampled, 
and which type of judgment he was asked to 
make. 
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Table 1 
Frequency of Judgments 


Recognition 
Brand Preference 


Ch LS 


Yes 3 3 
Total 11 15 


Yes 4 3 2 
Total 16 10 16 


Brand Sampled 


Yes 3 3 7 
Total 16 9 16 


Like- Dislike 


Brand Preference 


Total 


Like 
Total 


Brand Sampled 


Like : 10 
Total 17 


It is readily seen that the design of this 
study permits an analysis of variance tech- 
nique. Therefore a three-way analysis was 
carried out. Independence of observations is 
assured by having only one judgment per S. 
The raw data are frequencies and are un- 
usable in that form. They were converted 
into a relatively normalized distribution by 
means of the arcsine transformation (2, 6). 
While there are variations in the number of 
observations per cell, comparison of the theo- 
retical vs. the obtained residual variances in- 
dicated little damage to the resulting analysis. 

Table 2 shows the data converted to pro- 
portions, and Table 3 summarizes the result- 
ing analysis of variance. A problem always 
arises concerning the proper error term in 
evaluating the significance of the various ef- 
fects. We have tested for AB, AC, and BC 
interaction by the F-ratio, with ABC inter- 
action as the denominator. AC and BC are 
not significant at the five per cent level, while 
AB seems high enough to warrant another 
test. Consequently we have pooled the AC 


and BC interactions with the ABC interac- 

tion in order to obtain more degrees of free- 

dom and a more powerful test. 

for AB interaction then becomes: 
136 


3.75 = 5.28, against Fy 0 (4, 8 df) = 3.84, 
i 


The F test 


while for C-effect we have: 

= 22.17, against Foo (1, 8 df) = 5.32. 
Since the AB interaction is significant, the ap- 
propriate test for the A-effect is: 


56.5 
136 


while that for the B-effect is: 


12.5 
136 


= ().415, against Fy. (2, 4 df) = 6.94, 


= 0.312, against Fy 05 (2, 4 df) = 6.94. 

The following significant effects emerge: 
AB interaction and C. What does this mean? 
The interaction of brand preference and 
brand sampled, AB, results in a change not 
attributable to either variable alone. That 
is, Ss did like their own brands more often 
or identify them as their own more often than 
they did other brands. One may conclude 
from this that there exist differences in brands 
such that regular users of these brands can 
distinguish among them. 


Table 2 


Proportions of Judgments 
Recognition 
Brand Preference 


} 
| 
| 


0.200 
0.125 
0.437 


0.273 
0.300 
0.333 


0.412 
0,250 
0.187 


Like- Dislike 
Brand Preference 


Brand 
| Sampled 


j 
| 
| 
| 
| 
| 
| 


Ch 


Ca 0.714 0.445 
Ch 0.412 0.500 
LS 0.571 0.333 


LS 


0.250 
0.357 
0.588 


Brand | 


| 
| 


Sampled | 
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Table 3 
Analysis of Variance—Three Way 


Mean 
Square 


Sum of 
Squares df F-ratio 
A-effect 

(Brand Sampled) 113 2 
B-effect 

(Brand Preferred) § 42.5 
C-effect 

(Recognition vs. 

Like-Dislike) : 

AB 54: 4.39* 
AC 1.26t 
BC 0.0005¢ 
ABC 


56.5 


Total 


. Vs. Fo 05 (4, 4) = 6.39. 
t Vs. Fo.0s (2, 4) = 6.94. 


But what of the other hypothesis outlined, 
viz. that the affective judgment technique is 
more sensitive or accurate than the use of 
recognition judgments? In certain respects 
the findings are surprising. The proportions 
for the two kinds of judgments are different, 
ie., the ratios of “like-dislike’ and “my 
brand: yes-no” responses are different. These 
different ratios cannot be directly compared, 
however, to determine the relative accuracy 
of the two kinds of judgments. In order to 
do so it would be necessary to determine 
some level of chance expectancy; this can- 
not be done for the present design where Ss 
were unaware of the possibilities available to 
them. 

However, it is possible to re-analyze the 
data in order to estimate the relative ac- 
curacy of the two kinds of judgments. This 
is done by constructing a two-way table for 
each of the judgments separately. The re- 
sults are shown in Table 4. It will be seen 
that A- and B-effects are not significant in 
either of the analyses. This is similar to the 
finding in the three-way table above, and 
again indicates no significant effect due either 
to brand sampled or brand preference alone. 
Now, the three-way analysis indicated that 
the AB interaction was present. There is 
no way of evaluating the AB interaction for 
the two-way analyses, but they can be evalu- 


ated against one another: 


AB (like-dislike) = 97 a" 
AB (recognition) 70.20 ~~’ 


against Fy 95 (4, 4 df) = 6.39. 


We conclude that there is no difference be- 
tween AB interaction on like-dislike or recog- 
nition judgments, i.e., neither one is signifi- 
cantly superior as a means of distinguishing 
between brands. It should be pointed out 
that there is a tendency for the like-dislike 
judgments to be superior. With samples as 
large as these, however, it hardly seems large 
enough to be of practical significance. 


Discussion 

In the introduction, the use of a naming pro- 
cedure was taken to task. The essence of the 
argument, it will be recalled, was that even if 
two or more stimuli were appreciably different to 
S, he might not be able to express his awareness 
of this difference if the task demanded of him be 
one of applying names to the stimuli. To test 
this, Ss were presented with either a recognition 
or an affective assignment. Both seemed about 
equally accurate. 

Now it is certainly true that to name some- 
thing is to identify it. However, to identify 
something does not require that it be named. 
In other words, the process of recognition is 
broader than that of appellation. In either case, 
it is clear that there is some other process which 
we may call discrimination upon which recog- 
nition rests, logically if not temporally. The suc- 


Table 4 
Two Way 


Analysis of Variance 


Mean 
Square /F-ratio 


Sum of 

Squares df 

Recognition Judgments 
A-effect 21 2 
B-effect 61 2 
AB interaction 281 4 
8 


10.5 
30.5 
70.2 


0.149* 
0.434* 


Total 363 45.4 
Like-Dislike Judgments 


A effect 170 2 
B-effect 28 2 
AB interaction 388 4 

& 


85.0 
14.0 
97.0 


0.876t 
0.144¢ 


Total 586 73.6 


* Tested against AB interaction for recognition 
t Tested against AB interaction for like-dislike 
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cuss of the affective judgment attests to this. 
There is, to be sure, the likelihood that distinc- 
tions between test objects which are based upon 
affective judgments involve entirely different 
kinds of discriminations from those based upon 
recognition. The fact that the proportions of the 
two kinds of judgments differed indicates that we 
did not have a single process. The tendency for 
like-dislike to be superior is further support along 
these lines, though the difference is obviously of 
theoretical rather than practical import, being of 
such a small magnitude. It is further likely that 
different kinds of recognition, e.g., naming or 
“sorting,” involve different processes in part. 
After all, to name something properly requires a 
greater amount of precision in using the various 
cues provided by the stimulating object or event. 
But it is obvious that a sound investigation in 
an area of this sort requires that such distinc- 
tions be kept in mind. 

The study seems to have demonstrated, also, 
that one need not be restricted to “cognitive” re- 
actions to objects in order to test for discrimina- 
tion of an object’s properties. To be sure, affec- 
tive reactions tap different properties; but in the 
field of discrimination where in field situations 
the exact basis for the discrimination is often 
unknown, their use seems to be another possi- 
bility. Identification, recognition, etc. may have 
an affective base as well as the usually assumed 
sensory base. It then becomes a fascinating 
problem to tease out the role of the two factors 
in any given series of discriminations and most 
challenging to determine what cues are respon- 
sible for the affective judgment. In any case, if 
a substantial correlation between the two types 
of judgments can be demonstrated (as seems 
most likely), one may have a rapid technique 
for determining an S’s sensitivity to differences 
where this approach is applicable. 

There remain, as always, many questions con- 
cerning the type of judgmental situation that 
should be used. The answer to such questions, 
we believe, can best be formulated after the ob- 
jective of the investigator has been stated as 
precisely as possible. At such a time the par- 
ticular variables to be manipulated should emerge 
more clearly. In any case, it is obviously de- 
sirable that a person be capable of making a 
naming reaction before such a naming or other 
differential reaction is used to decide whether a 
discrimination is possible. Similar errors can be 
avoided if one thinks of discrimination as being 
partly a function of materials, partly of pro- 
cedure, and if one thinks of it, further, as exist- 
ing in various degrees of precision or exactness. 


Summary 
It was proposed that the use of a recog- 
nition judgment, as employed in previous 
studies of cigarette and cola brand discrimi- 
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nation, is not the most appropriate test of 
discriminability. Therefore, in the present 
study the use of a recognition judgment and 
an affective (like-dislike) judgment were com- 
pared. A total of 246 regular cigarette 
smokers were divided by alternation into 
two groups. Members of one group made a 
recognition judgment, the other a like-dislike 
judgment. Each S was given one of the “Big 
Three” cigarettes with brand name obscured. 

Results, analysed by analysis of variance 
using the arcsine transformation, were as fol- 
lows: 

1. Both types of judgment were made with 
better than chance accuracy. 

2. The like-dislike judgment technique was 
slightly more sensitive than the recognition 
judgment, but not significantly so. 

3. The distribution of responses (dichoto- 
mous in both cases) for each type of judg- 
ment was radically different, suggesting that, 
while about equally sensitive as applied to 
the present problem, each is an expression of 
a different type of psychological function. 

4. It was suggested that the use of an af- 
fective judgment may have a greater appli- 
cability to problems of discrimination than it 
has enjoyed, and merits further study in this 
respect. 


Received July 7, 1953. 
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Pointing Accuracy of a Joy Stick Without Visual Feedback 


W. D. Garvey and W. B. Knowles ' 


Naval Research Laboratory, Washington, D. C. 


The following problem was suggested by a 
practical situation in which it was necessary 
to know how accurately an operator could 
point a joy stick at a target without visual 
reierence to the position of the stick. The 
problem as it was investigated may be de- 
fined as determining man’s ability to point a 
joy stick at a series of small points of light 
displayed in the space of an otherwise totally 
dark room. 


Procedure 


Apparatus. A general pictorial representation 
of the apparatus is presented in Figure 1. The 
joy stick was an aluminum rod, one-half inch in 
diameter, 6.5 inches long, mounted on the shafts 
of two potentiometers. When the stick was 
pointed at a target, the horizontal and vertical 
components of the stick’s position were con- 
verted by the potentiometers into voltage read- 
ings, which were calibrated in terms of degrees 
of deviation in azimuth and elevation. 

The joy stick was located on a pedestal in the 
center of the dark room 68 inches from the for- 
ward wall, above the floor, and below the ceil- 
ing. The Ss sat in a chair on a platform im- 
mediately behind the pedestal so that the joy 
stick was at about stomach level; they were in- 
structed to grasp the stick in the right hand, 
palm up and thumb extended forward along the 
top of the stick. . 

The targets were 24 stationary small lights lo- 
cated at predetermined positions about the room. 
Since the room was totally dark and the bright- 
ness of the targets was very low, Ss were able to 
detect the target with little knowledge of the rela- 
tive distance involved. The targets may be re- 
garded as lying on the surface of a sphere of un- 
specified diameter, the center of which was lo- 
cated at the joy stick. When the joy stick was 
pointed directly ahead of S it indicated the zero- 
zero point for the azimuth and elevation dimen- 
sions of the targets. This center point (C in 
Figure 1) was denoted by a red cross (illumi- 
nated between trials only) which was mounted 
at the center of the forward wall, 68 inches from 
the floor. The positions of the targets were lo- 
cated in terms of degrees of azimuth and eleva- 


1 The authors wish to acknowledge the valuable 
assistance of Mr. Manus Munger and Mr. Gerald 
Hasson, formerly at this Laboratory, who assisted 
with the experiment and carried out the major por- 
tion of data analysis. 


tion from point C. These positions are given in 
Table 1. The schema of these positions may be 
interpreted with the aid of Figure 1; e.g., target 
No. 9 (labeled as such in Figure 1) was located 
16° below and 16° to the right of point C. The 
position of each of the other targets may be 
similarly interpreted. 

Method. The Ss were seven Naval enlisted 
men stationed at the Laboratory to serve as sub- 
jects; all Ss were right-handed. Without previ- 
ous dark adaptations Ss would have had better 
visual acuity at the end of an experimental trial 
than at the beginning. Since it was desired to 
maintain constant the visual component of the 


Table 1 


Target Positions and Response Errors 


N = 28 


Target Position* 


Response Errors 
(Degrees) 


(Degrees) 
Target  Eleva- Azi 

No. tion muth 

—74 

—74 

—46 

— 46 

—46 

—46 

—14 

—15 

—16 

—16 


Mean S.D 


10.3 
10.4 
16.0 


+56 
—45 
+79 
+16 13.9 
—11 15.0 
—74 15.7 
+76 12.8 
+44 11.8 
+16 &.8 
—114 99g 

—14 —45 8.7 

—14 —74 10.7 

+16 +75 16.9 

+15 +45 11.9 

+15 +15 12.3 

+15 —15 10.9 
17 +15 —42 12.4 
18 +15 —73 13.9 
19 +47 +75 20.5 10.0 
20 +47 +15 13.6 80 
21 +47 —15 11.7 6.6 
22 +47 —74 17.9 11.3 
23 +-75 +45 18.2 94 
24 +73 —40 17.1 7.8 


sk akhnus sss 
Secscoa wx teeunsny 


_ 
= 
wn Nw w 


nx MU 


n> 
~ S 


* A plus sign indicates upward elevation and right 
azimuth; a minus sign indicates downward elevation 
and left azimuth. 
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task, Ss were given 20 minutes of dark adapta- 
tion before an experimental session. 

The task was considered to be a relatively un- 
familiar one for Ss. Therefore, before the ex- 
periment proper, each of the seven Ss was given 
two practice periods (one per day). This prac- 
tice amounted to four trials of pointing the stick 
at each of the targets. 

The experiment proper began on the day fol- 
lowing practice. The Ss were given four experi- 
mental trials, two per day for two successive 
days. An experimental trial consisted of one 
presentation of each of the 24 targets. The tar- 
gets were presented to Ss in a different random- 

.ized order each trial. 

The S was given as much time as was needed 
to make the pointing response, and was in- 
structed to report when he considered the stick 
to be pointing at the target. The E immediately 
recorded the position of the stick, extinguished 
the target light, illuminated the center cross, and 
then instructed S to return the stick to the 
center position. The next aiming response was 
initiated by having S move the stick from this 
centered position to a position of pointing at a 
new target. Thus each aiming response began 
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Schematic representation of experimental situation. 
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and ended by having the stick pointed at the 
center cross. 

The S’s pointing responses were measured in 
terms of degrees of elevation and azimuth devia- 
tion of the stick’s pointing position from the tar- 
get’s position. This elevation and azimuth devia- 
tion was later transformed into a great circle 
deviation, which was a measure of the direct 
angular displacement between the pointing po- 
sition of the stick and the position of the target. 

The Ss were never given any knowledge of the 
correctness of their responses or the direction of 
their errors. 


Results 

Magnitude of Errors. The mean error for 
all Ss and all targets was 13.4°, with a range 
for single stimuli from 1° to 52°. Table 1 
presents the magnitude of errors to specific 
targets and the respective standard deviations. 
Mean errors with respect to location are pre- 
sented in Figure 2. The data indicate that 


response errors to some targets were greater 
The smallest errors appear 


than to others. 


C = center point; 


9 = position of target No. 9; S and hand stick are located in center of room. 
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response errors. 








Diagrammatic representation of magnitude and direction of 
Mean errors, in degrees of arc deviation from target, 


are presented for each target relative to center point (C); significant 
directional error deviation is indicated by direction of pointed arrow 


to have been made to targets just below the 
center of the room (i.e., targets 7 through 12, 
all approximately 15° below the center) and 
to the extremely low targets (i.e., targets 1 
and 2, approximately 75° below the center). 
Generally speaking, the largest errors were 
made to targets located on the extreme right 
and above the center of the room; next 
largest errors were made to targets on the 
extreme left and above. Statistical analy- 
sis * indicated that response errors to targets 
above the center of the room were signifi- 
cantly greater (p< .05) than those below 
the center; in addition, there was a propen- 
sity for right response errors to be greater 
than left response errors (p < .10). 
Direction of Errors. There was a tendency 
for Ss to err in a particular direction for spe- 
cific targets. Statistical sign tests* were 
made on the direction of Ss’ response errors. 
2 Dixon, W. J. and Mood, A. M. The statistical 


sign test. J. Amer. statist. Ass., 1946, 41, 557-566 
* See footnote 2. 


The arrows in Figure 2 give an indication of 
the direction of the errors made to a specific 
target; for example, the upward pointing 
arrow at target No. 4 indicates that a sta- 
tistically significant (p < .05) number of Ss’ 
responses were in the direction of the stick 
pointing above the target. The right-left 
arrows may be similarly interpreted. There 
were two directional tendencies in elevation 
errors. For responses made to targets below 
the center and 15° above the center there was 
a tendency for Ss to aim above the targets. 
However, for those targets 45° or more above 
the center there was a tendency for Ss to aim 
below the targets. There also appear to be 
two directional tendencies in azimuth errors. 
In general, Ss tend to aim to the left of tar- 
gets located on the extreme right and to the 
right of targets on the extreme left; i.e., Ss 
are disposed to undershoot the targets on the 
extremes. However, there is also a tendency 
for Ss to aim to the left of targets located 
just to the left of the center of the room and 
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to aim to the right of targets located just to 
the right of the center of the room. For these 
targets located around the center, Ss appear 
to overshoot the targets. 


Discussion 


The results indicate that more than 90% 
of Ss’ pointing responses were within 25° of 
the targets. In most applied situations, where 
joy stick control is feasible, such accuracy is 
adequate. It is clear, however, that pointing 
accuracy is partially determined by the locus 
of the targets. The fact that Ss were re- 
quired to hold the stick in a particular fashion 
is certainly an influential factor in determin- 
ing the differential pointing accuracy for the 
various targets. Pointing at targets located 
on the extreme right and upward required a 
more “strained” muscular response on the 
part of Ss with the result that responses to 
targets located in these areas would be more 
difficult responses to make. From interviews 


with Ss after the experiment it was learned 
that even though S could not see the joy 
stick or his arm, he attempted to point the 
stick at the targets as if he were sighting 
down the right arm; i.e., S was pointing the 


entire arm as well as the stick at the targets. 
Such aiming was not possible with all targets, 
but it was Ss’ belief that when such aiming 
was possible, they were able to respond with 
more ease and accuracy. Such a mechanism 
operating in the pointing procedure would 
have facilitated responses made to targets to 
the left and downward. The data imply that 
these two factors, manner of hand grip and 
aiming as if with the entire right arm, may 
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have influenced pointing accuracy, for re- 
sponses towards targets which were down- 
ward and to the left were more accurate than 
those upward and to the right. 

Although statistical analysis indicates that 
no improvement in performance took place 
during the course of the experimental trials, 
the fact that Ss respond with consistent error 
biases would indicate that Ss may learn to 
point more accurately if they are given knowl- 
edge of their results during the course of 
practice. 


Summary 


An experiment was conducted to determine 
how accurately man can point a joy stick at 
visual targets without visual feedback as to 
the position of the stick. The results of this 
experiment may be summarized as follows: 


1. Pointing errors ranged from 1° to 52° 
with a mean of 13.4°. Ninety per cent of 
the pointing errors were 25° or less in magni- 
tude. 

2. There was a correspondence between 
magnitude of errors and locus of the visual 
target. Errors were largest for responses 
made to targets located to the right and 
above the center of the room. There was a 
tendency for responses to be more accurate if 
they were made to targets either to the left 
or below the center of the room. 

3. There was a tendency for Ss to under- 
shoot targets located around the periphery of 
the target space and to overshoot targets lo- 
cated around the center of the space. 


Received July 16, 1953. 





Tur Journat or Appiiep PsycHoLocy 
Vol. 38, No 1954 


3 


Rate Accuracy in Handwheel Cranking * 


Robert S. Lincoln 
The Johns Hopkins University 


When a rate of movement is produced by a 
human operator in an attempt to reduce error 
in tracking performance, accuracy of control 
may be limited by the inability of the opera- 
tor to maintain a steady speed. His rate of 
movement may not continuously match the 
required rate. Because of this limitation it 
becomes important to determine the accuracy 
with which various rates of movement can be 
maintained. 

In a handwheel cranking task it is possible 
to introduce variation in the required rate of 
movement in three ways. 1. The required 
angular speed of rotation may be increased 
or decreased for a given radius of movement. 
When this is done, the required linear rate 
also increases or decreases. Linear and angu- 
lar rates, therefore, change in combination. 
Linear rate in this case refers to the units of 
distance traveled per unit of time by the 
handwheel knob. 2. The required linear rate 
may be changed while holding the angular 
rate constant. This is accomplished by vary- 
ing the radius of movement for any given an- 
gular rate. 3. The required angular rate may 
be changed while the linear rate is held at 
a constant value. This is accomplished by 
making compensating adjustments in the 
radius of movement as angular speed changes. 

The effects of variation in both linear and 
angular rates of movement have been studied 
with regard to performance in manual track- 
ing tasks. Helson (3) has measured the ac- 
curacy obtained with various rates of move- 
ment and different radii of cranking in com- 
pensatory tracking. Lincoln and Smith (4) 
have varied the angular and linear rates in 
combination in a direct-pursuit tracking task. 
The relationship between rate of movement 
and accuracy is complicated, however, by the 
nature of the tracking devices. 

* This work was supported by Contract N5-ori- 
166, Task Order I, between the Office of Naval Re- 
search and The Johns Hopkins University. This is 
Report No. 166-I-178, Project Designation No. NR 


145-089, under that contract. Miss Frances Wolf- 
ram aided in the collection and analysis of the data. 
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The accuracy of direct-pursuit tracking de- 
pends upon the combined accuracy of both 
rate and positioning movements (4). This 
results from the fact that accurate tracking 
is achieved only when the operator matches 
the position of the target with a specific po- 
sition of the handwheel knob and simultane- 
ously matches the rate of target movement 
with a proportional rate of handwheel motion. 

Consistent relationships between the posi- 
tion of the target and the position of the 
handwheel knob are eliminated with com- 
pensatory tracking devices, but in both com- 
pensatory and pursuit tasks it is possible to 
achieve the required rate of movement with- 
out maintaining the alignment of cursor and 
target. For this reason tracking error rec- 
ords do not give an accurate picture of the 
operator’s ability to maintain a steady rate 
of speed. Furthermore, in both of the studies 
described, changes in the required angular 
rate of cranking were produced by changes in 
gear ratios. This procedure reduces the load 
on the handwheel as rate increases and may 
affect the relationship between speed and ac- 
curacy. 

Purpose of this Experiment 

This report is concerned with the accuracy 
with which different linear and angular rates 
of cranking movements can be maintained for 
clockwise and counterclockwise directions of 
turning. 

In order to eliminate the difficulties inher- 
ent in tracking devices and to study rates of 
movement in greater isolation, a special task 
has been devised. Four characteristics of the 
task are of particular importance. 1. The 
subject is presented with a display consisting 
of a target and cursor. The cursor instan- 
taneously indicates the rate of handwheel 
movement by its spatial position relative to 
the target. It is impossible for the subject to 
achieve the required rate of movement with- 
out aligning the target and cursor. Errors 
are recorded as errors in rate. 2. With the 
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Schematic diagram of the apparatus: (1) accuracy-recording clock ; 


(2) electric counters; (3 display galvanometer; (4) screen; (5) crank; (6) 
tachometer generator; (7) light source for revolutions counter; (8) photo- 
electric cell; (9) light source; (10) recording galvanometer; (11) variable re- 
sistor; (12) light source; (13) photoelectric cell. 


apparatus used there is no relationship be- 
tween the angular position of the handwheel 
knob and the linear position of the cursor. 
The position of the cursor is solely dependent 
upon the rate of cranking. 3. The target re- 
mains stationary. 4. No changes in hand- 
wheel load occur with changes in the required 
angular rate of cranking. The “feel” of the 
handwheel does change when linear rate is 
changed for a constant angular speed. This 
results from the greater leverage of the larger 
handwheels. 
Apparatus ' 

Figure 1 is a schematic diagram of the ap- 
paratus. The subject turns the crank with 
his right hand. The center of rotation is 81 
cm. above the floor and the subject is seated 
with his right arm directly in line with the 
crank shaft. Friction in the system is kept 
at a low value by the use of ball bearings, 
and there is little inertia in the handwheel. 

Attached to the end of the crank shaft is a 
small pulley that drives a second pulley 
mounted on the shaft of a tachometer genera- 

1 The apparatus was constructed by Mr. Ervin G. 
Smith, Jr. of the Engineering Laboratory, Institute 


for Cooperative Research, The Johns Hopkins Uni- 
versity. 


tor. The output of the generator drives two 
mirror galvanometers that are wired in series 
with the generator. Separate light sources 
are focused on each of the mirrors. A patch 
of light 1 cm. high and 3 mm. wide is reflected 
from one of the mirrors to the back of a 
ground-glass screen that is perpendicular to 
the subject’s line of sight. The light strikes 
the glass at a point 119 cm. above the floor. 
Two vertical black lines, 3 mm. apart, are 
drawn on the screen. These black lines serve 
as the target. The subject’s task is to center 
the reflected patch of light between the two 
lines by turning the crank at a constant rate. 
Variation in the required angular rate of turn- 
ing is achieved by adjusting a potentiometer 
that controls the resistance in the circuit 
between generator and galvanometers. The 
greater the resistance in the circuit, the higher 
the rate of cranking required to center the 
light-spot (cursor) between the target lines. 

A circular spot of light reflected from the 
second galvanometer to the surface of a pho- 
toelectric cell provides the main source of 
error indication in the apparatus. Error tol- 
erance is determined by the angular position 
of the photocell relative to the recording 
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galvanometer. When the subject turns the 
handwheel at the required rate, plus or minus 
the chosen error tolerance, the light from the 
recording galvanometer strikes the photocell. 
This activates an electric clock that is read 
to the nearest .01 of a second. Another clock 
times the trial period and shuts off the ap- 
paratus at any selected trial-length. The 
timing and recording clocks do not begin to 
operate until the subject first reaches the re- 
quired rate of turning. Because of these fea- 
tures the number of seconds accumulated on 
the dial of the recording clock indicates the 
total time, during a trial period, in which the 
subject maintains a rate of turning within 
established tolerance limits. 

Two other indications of performance are 
obtained simultaneously. One electric counter 
indicates the number of times during a trial 
period that the subject’s rate of turning falls 
within the tolerance limits. This is a measure 
of the frequency of oscillation in rate within 
a trial. A second electric counter counts the 
number of revolutions actually turned during 
a trial. At low speeds accuracy in counting 
is achieved by counting to the nearest .2 of a 
revolution. At the higher speeds only one 
count per revolution is possible. Both the 
recording clock and the counters are enclosed 
in soundproofed boxes. 

Variation in the radius of turning is achieved 
by attaching the knob to the handwheel at 
various distances from the center of rotation. 
A counter weight is also provided. 

Error recording for counterclockwise turn- 
ing is accomplished by reversing the tachome- 
ter generator end-for-end since the generator 
does not produce reliable voltages when the 
direction in which the shaft turns is reversed. 


Procedure 


Only right-handed male subjects were used 
in the experiment. Fifteen subjects turned 
the handwheel in the clockwise direction, and 
fifteen different subjects turned the hand- 
wheel in the counterclockwise direction. All 
subjects received five consecutive trials, 30 
seconds in length, at each of five angular rates 
of cranking combined with each of three 
crank radii. The rates used were 25, 75, 125, 
175, and 225 revolutions per minute. The 
crank radii were 2.5, 7.5, and 12.5 cm. 
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The orders in which the subjects cranked 
under the various conditions were determined 
by a 15 x 15 latin square. The same square 
was used for both directions of cranking. 

Subjects were instructed to keep the cursor- 
light centered between the two target lines at 
all times. The apparatus was so arranged 
that the cursor moved to the right as the 
speed of turning increased from zero velocity 
for clockwise turning. For counterclockwise 
cranking the cursor moved to the left as speed 
increased. The range of error tolerance was 
kept at + 9% of the required rate of turning. 
Subjects were not aware that there was any 
tolerance in the scoring system. 


Results 


Effects of Linear and Angular Rates. In 
this experiment linear rate was varied inde- 
pendently of angular rate by changing the 
radius of movement for various constant an- 
gular speeds. When the angular speed of 
movement is constant, rate-accuracy increases 
with increased linear rate in the lower range 
of handwheel speeds. At the higher hand- 
wheel speeds this relationship is reversed, and 
accuracy decreases as linear rate increases. 

Figure 2 pictures these results which are 
similar to those obtained by Helson with a 
compensatory tracking task (3). Helson, 
however, did not distinguish between angular 
and linear rates of movement. In Figure 2 
linear rate increases with increased handwheel 
radius for any one handwheel speed, but the 
actual linear rates are not equal for the same 
radius at different handwheel speeds. 

The significance of the differences between 
handwheel speeds, radii, and the interaction 
between these two variables was tested by the 
non-parametric analysis of variance described 
by Friedman (2) and Wilcoxon (6). This 
procedure was necessary because the data 
exhibited heterogeneity of variance when sub- 
jected to Bartlett’s test (1). For both direc- 
tions of turning the two main variables and 
the interaction between them were significant 
sources of variation (p < .01). The results 
for the two directions of movement are com- 
bined in Figure 2 because they show similar 
tendencies. 

Figure 3 is a graph of the relationship be- 
tween accuracy and angular rate when linear 
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Fic. 2. Accuracy as a function of handwheel 
speed for different crank radii. The ordinate values 
indicate the mean time that the rates were main- 
tained within specified tolerance limits. A score of 
thirty is the maximum possible accuracy score. 


speed is constant. The values in the figure 
were obtained by interpolation between points 
on Figure 2 and calculation of the actual 
linear rates. Linear rate may be held con- 
stant while angular rate changes by adjusting 
the radius of the cranking movements. 

Figure 3 shows that, for various constant 
linear rates, rate-accuracy increases with an- 
gular speed up to about 175 rpm. For the 
lower linear rates it appears that slightly 
greater accuracy might be obtained at speeds 
beyond 175 rpm. It would be necessary 
to extrapolate values in order to extend all 
curves over the entire range of angular speeds. 

There is one factor which does not remain 
constant when linear speed is held at a fixed 
value by adjusting handwheel size as angular 
speed increases. Different muscles become 
involved in the control of the handwheel as 
size is changed. This factor is of relatively 
little importance for the handwheels used in 
this study. 

It might be expected that increased accu- 
racy would result from increased angular or 
linear rates of movement since, as Helson has 
pointed out (3), the absolute sensitivity of 
the handwheel decreases as rate increases. 


For example, with a + 9% error tolerance, 
the range of permissible error is about + 2.2 
rpm at a rate of 25 rpm, and + 20 rpm 
at a rate of 225 rpm. Inspection of Figure 
2, however, indicates that at the higher hand- 
wheel speeds little advantage is taken of the 
decreased sensitivity. For the larger radii 
accuracy actually drops off with handwheel 
speeds greater than 75-125 rpm. This ef- 
fect cannot be related to the physical limita- 
tions of the subjects. Inspection of the rec- 
ords obtained with the revolutions counter 
showed that, with 11 minor exceptions in 450 
trials, all subjects were capable of cranking 
with all radii at an average angular rate 
greater than 175 rpm when attempting to 
achieve the rate of 225 rpm. 

The data concerning the frequency of oscil- 
lations in rate within a trial suggest that the 
oscillatory nature of cranking movements 
places a limit on the rate-accuracy achieved. 
These frequencies are shown in Figure 4 in 
which the data for the two directions of turn- 
ing are again combined. 

A single oscillation in rate, as measured in 
this experiment, includes both the change in 
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Fic. 3. Accuracy as a function of handwheel 
speed for different constant linear rates. The values 
on the curves were obtained by interpolation be- 
tween points in Figure 2 and calculation of the 
linear rates. 
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rate greater than the error tolerance and the 
return to a rate within the error tolerance. 
From Figure 4 it is apparent that the num- 
ber of oscillations increases with both linear 
and angular rate in spite of the reduction in 
sensitivity. For the two larger radii the num- 
ber of oscillations increases at an increasing 
rate as the required handwheel speed is raised. 
In contrast, the number of oscillations for the 
smallest radius increases at a much slower 
rate. 

Non-parametric analysis of variance estab- 
lished the significance of the over-all effect of 
handwheel speeds and radii upon the number 
of oscillations per minute (p < .001). 

Figure 5 shows the durations of the mean 
rate-oscillations in seconds for the various 
handwheel speeds and radii. The durations 
plotted in the figure were obtained from Fig- 
ures 2 and 4. The accuracy scores for each 
point in Figure 2 were first subtracted from 
the maximum possible score of 30 seconds. 
The resulting values indicated the time spent 
outside of the error tolerance in an average 
trial. These scores were then divided by the 
number of oscillations in rate for the appro- 
priate points shown in Figure 4. The ob- 
tained values were the mean durations of the 


rate-oscillations in seconds. Figure 5 shows 
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Fic. 4. Frequency of rate-oscillations as a func- 
tion of handwheel speed for different crank radii. A 
single oscillation includes both the change in rate 
greater than tolerance limits and the return to a 
rate within tolerance limits. 
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Fic. 5. Durations of mean rate-oscillations. 


that the duration of the mean oscillations in 
rate is a decreasing function of handwheel 
speed. 

Considered together, Figures 4 and 5 show 
why accuracy in Figure 2 does not continue 
to increase above certain speeds as the sensi- 
tivity of the handwheel decreases. At the 
lower handwheel speeds the number of rate- 
oscillations increases slowly as angular and 
linear speeds increase in combination, but the 
durations of the oscillations decrease rapidly. 
More errors are made, but they are elimi- 
nated much more quickly. The result is in- 
creased accuracy with increased speed. In 
the middle range of handwheel speeds the 
durations of the oscillations still decrease as 
speed increases, but at a much slower rate. 
At the same time, however, the number of 
oscillations is increasing rapidly. The result 
is a levelling-off and even a decrease in ac- 
curacy. 

This interpretation also accounts for the 
relative accuracy obtained with different radii 
of movement. At the low handwheel speeds, 
for example, a greater number of oscillations 
appear with the larger handwheels. However, 
the durations of the oscillations are shorter 
for the larger handwheels and increased ac- 
curacy results. 


Figure 5 provides suggestions concerning 
the nature of the responses involved in main- 
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taining a rate of movement with a handwheel. 
The durations of the average oscillations are 
too short to have allowed the initiation of 


corrective responses on the part of the sub- 


jects. In having access to the instantaneous 
indication of rate provided by the apparatus, 
subjects received more information than they 
could use. Apparently subjects accepted the 
unsteadiness in their movements as being be- 
yond their control and tried to center their 
oscillations about the target in order to mini- 
mize error. These results support the sug- 
gestion of Lincoln and Smith (5) that accu- 
racy in direct pursuit-tracking with a hand- 
wheel control is limited by the oscillatory 
nature of the tracker’s response. 

Figure 6 indicates the number of oscilla- 
tions in rate per handwheel revolution. At 
the higher handwheel speeds the relative num- 
ber of oscillations is fairly constant for the 
different radii. The actual number approaches 
a rate of one oscillation per revolution. This 
may mean that there is a more pronounced 
tendency for deviations in rate to occur when 
the handwheel knob is in a particular spatial 
position. The most likely position would be 
at the top or bottom of the swing during ro- 
tation. 
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Effects of Direction of Cranking. Although 
the accuracy achieved in clockwise turning 
was slightly higher than for counterclockwise 
turning on four of the five speeds, these dif- 
ferences were not significant when tested by 
the necessary non-parametric methods. Dif- 
ferences between directions were established, 
however, in another measure of performance 
—the constant rate error. 

Figures 7 and 8 show the constant errors 
in the average rate of cranking in rpm. 
Figure 7 is for clockwise turning, while Fig- 
ure 8 is a plot of the data for counterclock- 
wise turning. For both directions the sign of 
the constant errors indicates that, considered 
as groups, the subjects tended to crank at 
rates that were slower than the required rate, 
although they were capable of turning at 
higher rates. Beyond the speed of 175 rpm 
this statement does not hold since some sub- 
jects could not maintain the rate of 225 rpm. 

The constant errors for the counterclock- 
wise direction are significantly greater than 
for the clockwise direction (p < .01). In ad- 
dition, constant error shows a significant in- 
crease (p < .01) in size as linear rate in- 
creases with constant angular rates for both 
directions of cranking. For the counterclock- 
wise direction only, constant error shows a 
significant increase (p < .001) in size as an- 
gular and linear rates increase in combination. 
The speed of 225 rpm was not included in 
the calculation of the latter two probabilities 
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Fic. 7. Constant error in revolutions per minute 
for clockwise turning. 
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because the error at that speed is influenced 
by the physical limitations of the subjects. 

If subjects do tend to ignore their unsteadi- 
ness in rate and center their oscillations about 
the target, they do so with a persistent bias 
toward rates that are slower than the re- 
quired rate. The bias is greater for cranking 
in the counterclockwise direction. 


Summary 


Subjects cranked a handwheel at each of 
five different speeds combined with each of 
three different handwheel radii. Fifteen sub- 
jects cranked in the clockwise direction while 
fifteen other subjects cranked in the counter- 
clockwise direction. 

The subjects were provided with an in- 
stantaneous visual indication of their rate of 
cranking. In appearance the task resembled 
a conventional tracking problem. With the 
apparatus used, however, the task was re- 
duced to the maintenance of the required 
rates since positional relationships between 
the rate indicator and the handwheel knob 
were eliminated, and it was impossible for the 
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subjects to achieve the required rate without 
aligning the indicator and target. In addi- 
tion, no changes in handwheel load were in- 
troduced by changes in the required speed of 
turning. 

At the lower handwheel speeds, rate-accu- 
racy improved with increases in the linear 
rate of movement for a constant angular rate. 
At the higher angular speeds an inverse rela- 
tionship appeared between linear rate and ac- 
curacy. Linear rate refers to the units of 
distance traveled per unit of time by the 
handwheel knob. Linear rate was varied in- 
dependently of angular rate by changing the 
radius of the movement. 

For constant linear rates accuracy always 
improved with increased angular rate up to 
about 175 rpm. The failure of accuracy to 
continue to improve above a certain point 
when linear and angular rates were increased 
in combination was attributed to the oscil- 
latory nature of cranking movements. 

Subjects tended to crank at rates slower 
than the required rate although they were 
capable of maintaining the required rate. 
This tendency increased as both linear and 
angular rate increased. No significant dif- 
ferences in accuracy appeared between the 
two directions of movement, but those sub- 
jects who cranked in the counterclockwise di- 
rection showed a significantly greater tend- 
ency to lag in rate. 
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Reply to Dr. Wells and to Miss Epstein * 


Howard D. Hadley 
Morey, Humm, and Johnstone, Inc., New York, N. Y. 


Dr. Wells is correct in his analysis. I 
should have used credulity. 

I should also like to make some comments 
about Miss Epsiein’s note. In therapy, you 
are concerned, at least at first, with reducing 
threats. In advertising, where you are deal- 
ing with more “normal” persons, the task at 
hand is to offer enhancements to the con- 
sumer. In this latter case, threat is some- 
thing to be avoided, not to be “cured.” In 
both instances, a sympathetic atmosphere is 
a primary requisite. I was mainly concerned 
with the atmosphere created by both adver- 
tisers and therapists. 

Miss Epstein is somewhat correct when she 
says that the non-interference principle is not 

*Wells, F. L. Comment on word meaning. J. 
appl. Psychol., 1954, 38, 133. Epstein, Mary. A 


note on “the non-directive approach in advertising 
appeals.” J. appl. Psychol., 1954, 38, 133-134. 
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applicable to advertising. If there were no 
“interference,” there would be no selling. 
However, doesn’t a patient have an attitude 
towards the therapist at the end of the ses- 
sions? Also, aren’t these attitudes often fa- 
vorable? By not “interfering,” may not a 
favorable attitude be developed? 

Actually, there is no pure example of in- 
ferred advertising. Direct and inferred were 
contrasted to sharpen the concept. Just as 
there are few (if any) completely introverted 
or extroverted persons, there are few (if any) 
advertisements which are completely direct or 
inferred. 

In the end, it is the atmosphere created by 
the advertisement that is important. Direct- 
inferred, directive-nondirective, are more logi- 
cal constructs than useful tools. It’s the end 
result that is important. 


Note on the Work of the British Standards Institution 


K. F. H. Murrell 


Department of Ergonomics, T. 1. (Group Services) Limited, Birmingham, England 


In 1949, when I was Head of the Naval 
Motion Study Unit in the British Admiralty, 
I was invited to serve as an Admiralty repre- 
sentative on a British Standards Institution 
Committee, which was considering the design 
of pressure gauges. I had for some time been 
studying published research of workers such 
as C, J. Berger, A. Chapanis, W. F. Grether, 
W. R. Garner, W. E. Kappauf, R. B. 
Loucks, R. E. Sleight, S. D. S. Spragg, 
and M. J. Warwick on factors influencing the 
readability of dial faces, and I advised this 
Committee of these findings. Most of them 
differed too radically from existing practice 


for them to be acceptable to the trade as 
standards but some were adopted mainly by 
way of footnotes and recommendations. 
The British Standards Institution subse- 
quently decided to set up a Technical Com- 
mittee to produce a “code of practice” on the 
graduation and marking of instruments, using 
as its starting point an Admiralty Report 
(Naval Motion Study Report No. 48) which 
I had written summarizing all available re- 
search on the subject. This Committee has 
now been sitting for two years and not only 
has it considered the findings of existing re- 
search but it has also arranged for further re- 
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search to be done to fill gaps in our knowl- 
edge. One such experiment is being carried 
out at the moment on the relationship be- 
tween dial size and accuracy of reading. 
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This is, I believe, the first instance of psy- 
chological research being used as the basis of 
deliberations by the British Standards Insti- 
tution. 


Personnel Psychology and Small Business 


W. Grant Dahlstrom 


University of North Carolina 


The psychologist acting as consultant to a 
small concern operates under serious restric- 
tions on his usual personnel methods. Often 
the selection of employees is only a small part 
of his general services. The number of new 
men being hired is small, the turnover rate 
may be infinitesimal, and even a survey of the 
men on the job may yield paltry information 
in comparison with usual employment studies. 
Ideally this sort of operation should be car- 
ried on at a national level with many or all 
of the local concerns of the same sort partici- 
pating in the project. This would probably 
only be feasible through arrangements be- 
tween an association of these concerns and 
some psychological consultants similar in 
scope to the Psychological Corporation. Lack- 
ing these connections, the psychologist locally 
still faces these persistent difficulties. 

When the local concern invests considerable 
time and money in job training a professional 
level man, the contribution of the psychologi- 
cal consultant in screening applicants could 
be valuable. The method of choice in han- 
dling this problem would include the utiliza- 
tion of screening tests. But the situation is 
quite frustrating because of the difficulties in 
the way of establishing score standards on 
these tests. In the opinion of the writer, 
there are three lines of evidence which can 
lend support to the consultant in making his 
judgments and recommendations. 

One of these is the degree of homogeneity 
in the test data provided by a survey of exist- 
ing employees. Recently the writer obtained 
such data on a firm of medical consultants 


providing a business advisory service to phy- 
sicians and dentists. There were only 10 field 
men and 2 central office men who had previ- 
ously acted as medical consultants themselves. 
The results on the Strong Vocational Interest 
Blank gave a compelling impression of homo- 
geneity. (In the group, 5 men had A ratings 
in both areas VIII and IX; 4 in IX; and 3 
others in VIII. The only other A ratings oc- 
curred in area III twice, area V twice, and 
areas VII and X once each. Very low rat- 
ings appeared in area II in more than half 
the group.) The MMPI results, though less 
uniform, were also rather homogeneous. (The 
triad of scales Pd, Pa, Ma were at a codable 
level (above 54 T-score) in half the group, 
with Pa appearing at this level in 10 of the 
12 tests. The K scores ranged above 60 T- 
score without exception. ) 

Obviously such a concept of homogeneity 
is relative. The men are very similar when 
you consider the assorted patterns from men- 
in-general. The uniformity is less impressive 
if reference is made to male college graduates 
only, or to business majors, or even more ap- 
propriately to those men with sufficient train- 
ing to be considered at all for employment in 
such a company. Nevertheless, this seems a 
workable concept. If sufficient data were 
available in a usable form on various general 
groups, the psychologist could make a judg- 
ment about the homogeneity resulting from 
selective survival. Not a great deal of the 
normative data on our multiscale tests is pub- 
lished in a form in which we can judge rela- 
tive frequency of particular score combina- 
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tions. A quantitative index of relative uni- 
formity could be devised which would serve 
better than judgment in this matter. 

This is not meant to imply that experience 
on the job, faking of test responses, and simi- 
lar sources of variance may not also be op- 
erating to produce this uniformity, but as re- 
search information is accumulated on such 
factors as these, reasonable allowance could 
be made for them as well. This is one of the 
most workable meanings that can be offered 
for theory, as it is to be used in the areas of 
employee selection or vocational choice as 
discussed by Dr. Super in his divisional presi- 
dential address. 

The second line of evidence available to the 
psychologist involves precisely this matter of 
theory. If the findings from the test survey 
are not only satisfactorily homogeneous, but 
in addition conform to expectations based on 
considerations of the job components, then 
the consultant can feel on even firmer ground 
in making his decisions. 

In this particular instance, talks with the 
psychologically shrewd assistant manager in 
the company, as well as informal interactions 
with Dr. Clayton Gerken at the State Uni- 
versity of Iowa, led the writer to formulate 
certain characteristics of the ideal medical 
consultant. These speculations involved a 
marked interest in business detail combined 
with an equally strong interest in business 
contact. It was also expected that an execu- 
tive interest level and maturity would show 
up on the tests. This actually proved to be 
the common core of test results from the 
group. Workers more familiar with the tests 
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and with a wider knowledge of job break- 
downs could erect a much more detailed set 
of expectations. Psychological consultation 
will always involve a modicum of this sort of 
psychologizing, even in the face of greater 
usage of actuarial methods at specific de- 
cision points. Accumulated research findings 
should facilitate the formulation of these 
working hypotheses. 

The third line of evidence stems from the 
correspondence between a man’s rated pro- 
ficiency on the job and the degree of ap- 
proximation in test score pattern to the ideal 
employee. Here restriction in range of score 
dispersion is a serious limitation, and if the 
psychologist finds even a moderate relation- 
ship, he may assume more satisfactory values 
would be obtained with a wider sampling. 

This last point is obviously the most decep- 
tive line of evidence of the three since such 
eventualities as non-rectilinearity in the cor- 
relation surface, unexpected discontinuities in 
the functions, or errors in the validity data 
may arise to embarrass him. These assump- 
tions would have to be continually checked 
against research findings from psychologists 
more favorably situated in respect to criterion 
data and research samples. 

If the consultant takes the trouble to insti- 
tute a testing program and finds relative 
homogeneity, consistency with expectations, 
and fair corroboration of test findings with 
on-the-job performance, he can operate with 
considerably more confidence and effective- 
ness on even small projects than he could on 
the basis of sheer intuitive speculation alone. 
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Kinsey, A. C., Pomeroy, W. B., Martin, C. 
E., Gebhard, P. H., et al. Sexual behavior 
in the human female. Philadelphia: W. B. 
Saunders Company, 1953. Pp. xxx + 842. 
$8.00. 


Hiltner, S. Sex ethics and the Kinsey re- 
ports. New York: Association Press, 1953. 
Pp. xi+ 238. $3.00. 


Aberle, Sophie D. and Corner,G.W. Twentvy- 
five years of sex research. History of the 
National Research Council Committee for 
Research in Problems of Sex, 1922-1947. 
Philadelphia: W. B. Saunders Company, 
1953. Pp. v + 248. $4.00. 

Applied psychology has grown in direct 
ratio to the accumulation of facts. But in 
the area of sex behavior, answers to the ques- 
tion (with apologies to Dragnet) “What are 
the facts, Ma’am?” have been hard to come 
by. This is due chiefly to our puritanical 
heritage which has thrown such strong taboos 
around the subject. In a real sense, we are 
now witnessing the final retreat of the censor 


in denying man the right to knowledge of the 


natural functions of the body. This retreat 
began 400 years ago when Vesalius dared to 
pierce some of the mystery of what goes on 
inside the human body and, 100 years later, 
his disciple, William Harvey discovered circu- 
lation of the blood. 

One hesitates to comment in detail con- 
cerning the epoch-making work of Kinsey 
and his co-workers. This is so because of the 
extensive reviews that have already appeared 
in connection with his earlier publication on 
the male. It is also because his book on the 
female has already received such widespread 
publicity in newspapers, magazines, and on 
radio and TV. One would only be repeating 
what so many critical and uncritical evalua- 
tors have already written or said. For this 
reason, this review merely points to the facts 
that have been marshalled in the enormous 
range of individual differences in reported ca- 
pacity, or rather claimed performance, and to 
the extensive detailed information set forth 
in Part III entitled “Comparisons of Female 
and Male.” The sex difference shown by the 


fact that in 30 out of 33 items the male, on 
the average, is more readily affected by psy- 
chological stimuli is worthy of special note. 
This finding provides a wealth of insight for 
better understanding of the psychology of hu- 
man males and females. 

The applied psychologist would do well to 
follow closely discussions of Kinsey’s work by 
representatives of organized religion. Rever- 
end Seward Hiltner, who is pastoral consult- 
ant to the Editorial Advisory Board of Pas- 
toral Psychology Magazine, which was founded 
in 1950, has written a detailed interpretation 
of all aspects of Kinsey’s reports. His book 
will aid many clergymen to assimilate the 
findings with a minimum of trauma. It 
should enable them to do a better job of un- 
derstanding and counseling their parishoners, 
young or old. Enlightened premarital and 
marriage counseling as a pastoral duty has 
been going on to an ever-increasing extent 
for a generation. Hiltner’s book will un- 
doubtedly accelerate this important move- 
ment. 

The significance of Kinsey's work and of 
Hiltner’s interpretation can be fully under- 
stood only by studying the magnificent 
achievements of the NRC Committee for Re- 
search in Problems of Sex. Aberle and Cor- 
ner’s report gives due credit to Robert M. 
Yerkes who was chairman from 1922 to 
1947. Yerkes’ foresight, initiative, tact, cour- 
age, and everlasting persistence were pri- 
marily responsible for this development. He 
was able to secure the collaboration of top- 
notch scientists. He was also able to secure 
continuity of financial support. The Com- 
mittee courageously supported research on all 
aspects of sex in all species from paramecium 
to man. Scores of researches were supported 
and hundreds of research reports were pub- 
lished in a wide range of scientific journals, 
monographs, and books. The bulk of the 
work was directed toward infrahumans but, 
from the beginning, research on sex behavior 
in man was strongly supported. The latter 
studies were begun by R. S. Lee, Adolph 
Meyer, L. M. Terman, and W. R. Miles in 
the nineteen twenties, were continued in the 
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thirties by Carney Landis, E. Lowell Kelly, 
and Terman. Since 1937, Dr. Kinsey and his 
group at Indiana University have been the 
chief beneficiary. 

Thus psychology has moved from an intel- 
lectualistic preoccupation with man as a ra- 
tional being to a more realistic understand- 
ing of man as a behaving organism in all of 
his manifold adjustments. In short, sex can 
no longer be ignored. 

Donald G. Paterson 

University of Minnesota 


Lundin, R. W. An objective psychology of 
music. New York: Ronald Press, 1953. 
Pp. ix + 303. $4.50. 


This book is a noteworthy addition to the 
psychology of music, especially for classroom 
use with the undergraduate student. Its style 
is clear and simple, its coverage is unusually 
comprehensive, and its range is wide. It will 
truly facilitate the learning process for the 
student, an advantage which has often been 
lacking in this field. The psychology of 
music demands an understanding of two very 
different disciplines, one of them a science, 
the other an art. The vocabulary and style 
employed by the artist has often proved baf- 
fling to the scientist, and vice versa. Lundin 
has shown a special talent as an interpreter, 
and has made his material thoroughly clear 
to both. His occasional oversimplifications 
will prove justifiable in terms of the student 
who seeks competency in two fields. 

In the history of the psychology of music 
the striking advance represented by the Sea- 
shore tests has actually turned out to be one 
of the greatest disadvantages. Preoccupa- 
tion with this temptingly simple but pe- 
culiarly inadequate instrument has served as 
a block to the development of better measur- 
ing scales, and the imitators of the great pio- 
neer have done little to improve on his origi- 
nal work. Lundin has thrown all these studies 
into better perspective and his summaries and 
evaluations should save many future missteps 
in this special field of investigation. 

The most significant guideposts in this par- 
ticular area of musical research seem to point 
in the direction of cultural rather than 
nativistic explanations and _ interpretations. 
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Lundin does well therefore in taking a stand 
for an interbehavioral point of view, and 
steering his readers away from the unsub- 
stantial rhapsodies of Howe and the even 
more refined semantics of later writers, to- 
ward the more substantial investigations and 
arguments, supported by something more than 
fine writing. This interbehavioristic trend 
which Lundin endorses so emphatically per- 
vades the thinking of many writers even 
though they identify themselves with less ex- 
acting and more eclectic schools. Farnsworth, 
one of its strongest advocates, has made this 
point of view much more attractive and stimu- 
lating by demonstrating its usefulness in a 
varied program of research. 

From many quarters there are evidences of 
renewed interest and activity in the problems 
of esthetics. Audiences and amateur per- 
formers in both the graphic and theater arts, 
but especially in music, are in a period of 
expansion. Now that the groundwork has 
been so carefully laid, any pedagogue or ex- 
perimentalist, and even Lundin himself, can 
go on to be more persuasive and more fruit- 
ful in developing the psychology of music. 
The whole field has been brought up to date, 
the arguments are sound, and the movement 
toward further knowledge has been greatly 
accelerated by this timely and much needed 
book. 

(Mrs.) Kate Hevner Mueller 


Indiana University 


Woolf, M. D. and Woolf, Jeanne A. The 


New York: 
1953. Pp. ix + 416. 


student personnel program. 
McGraw-Hill Co., 
$5.00. 


Subtitled, /ts development and integration 
in the high school and college, this book is an 
“. . . attempt to picture a comprehensive 
student personnel program .. .” (p. v) and 
draws heavily on the authors’ twenty-six 
years of educational experience. Most pro- 
fessional workers will find much of interest 
in it. Most will also be bothered by a num- 
ber of shortcomings. 

Virtually all phases of the student person- 
nel program are dealt with. After a short 
introductory chapter on the expanding role 
of the personnel worker, there are chapters 
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on counseling, group methods, student govern- 
ment, discipline, housing, remedial services, 
measurement, orientation, faculty advising, 
training of personnel workers, and adminis- 
tration of the program. Unfortunately, there 
is no readily discernible framework, no inte- 
grating philosophy which might have given 
added meaning to the extensive content. The 
chapters seem almost to be separate essays, 
and the demands on the reader are not al- 
ways rewarded. 

It is difficult to specify a single group for 
which this book is entirely appropriate. Pro- 
fessional workers will find parts of it quite 
elementary although the many examples will 
be interesting. Beginning students will not 
have the necessary backgrounds of informa- 
tion to give it the critical reading it requires. 
Academic administrators are likely to become 
bogged down in details which are not woven 
into a meaningful fabric. 

It is an unnecessarily difficult book to read, 
perhaps because the authors appear not to 
have been sure whether they were doing a 
primarily scholarly work or one based mostly 
on their own experiences, comments in the 
preface notwithstanding. Those sections re- 


porting their experiences are probably the 


best in the book. The scholarly sections are 
cursory and not well done. Questionable 
scholarship is indicated, for example, by an 
acknowledgment to another staff counselor 
for pointing out that “. . . the refusal of the 
counselor to let the client lean on him may 
be actual rejection” (p. 33). The theories of 
misbehavior discussed in the chapter on disci- 
pline do not represent the best of modern psy- 
chology. Elsewhere, they say, without refer- 
ence to research, “If on the American Council 
on Education Psychological Examination, the 
L score . . . percentile ranking is twice that 
of the Q score . . . or vice versa, there is un- 
even mental development and often a per- 
sonality adjustment problem” (p. 227). 

The chapter on counseling is uneven, and 
the Chicago point-of-view is given such em- 
phasis as to suggest a pre-eminence not 
yet granted by most counselors. The chapter 
on faculty advising is excellent for its many 
practical suggestions but is marred by a 
poorly handled survey of the literature. The 
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authors should be commended for their forth- 
right discussion of training requirements in 
which psychology is placed at the center of 
the program. 

Taken as a whole, the defects of this book 
seem to stem from two principal sources. In 
the first place, the:e is the previously men- 
tioned apparent contusion about whether this 
is a scholarly book or one primarily reporting 
on experiences. Both are vaiuable and neces- 
sary, but they need to be carefully amalga- 
mated, not cut and patched together. Sec- 
ondly, there is the lack of a carefully thought 
out and explicitly stated philosophy of stu- 
dent personnel work. The tissues of a good 
book are presented, but there is no articulat- 
ing skeleton. 

John W. Gustad 

University Counseling Center, 

University of Maryland 


Leitner, K. Hypnotism for professionals. 
New York: Stravan Publishers, 1953. Pp. 
127. $4.00. 

Konradi Leitner was a stage hypnotist who 
became quite well known through his work 
with the USO during the war. In this post- 
humous book he describes his methods of 
working before an audience. He does this in 
a clear and interesting manner but he has 
contributed nothing to the scientific knowl- 
edge of hypnosis. There are a number of 
illustrations in the book using a very pretty 
feminine model with whom I’m sure any 
hypnotist would be happy to work. 

William T. Heron 


University of Minnesota 


Jennings, E. E. Techniques of successful 
foremanship. Madison: School of Com- 
merce, Bureau of Business Research and 
Service, University of Wisconsin, Wiscon- 
sin Commerce Studies, Vol. I, No. 4, 
March, 1953. Pp. 41. $1.15. 


This is the report of a study undertaken 
for the purpose of gaining an understanding 
of the techniques or traits characteristic of 
successful foremen prior to undertaking a su- 
pervisory training program. The “Jennings 
Supervisory Analysis,” a 23-item question- 
naire, was administered to 1,682 workers and 
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their 52 foremen in a large midwestern plant. 
All workers filled out the questionnaire by 
checking items which “outstandingly” de- 
scribed their own foremen. Every third 
worker filled out a second questionnaire, 
checking the 3 items he considered most de- 
sirable in foremen. The 52 foremen filled 
out 2 questionnaires. On the first, they 
checked items which best described their own 
behavior; on the second, they checked the 3 
items they considered most desirable in fore- 
men. Foremen were rated for »ver-all ability 
by pooling ratings of their immediate su- 
periors with those made by top management 
in the plant. An appendix describes the 
method used in obtaining and pooling these 
ratings. 

Findings presented are relative to the 23 
items in the questionnaire. This, unfortu- 
nately, places a limitation on their meaning 
because all 23 items are favorable character- 
istics, selected when the questionnaire was 
developed on the basis of being “both gen- 
erally descriptive and desirable” of foremen. 
With this limitation, findings include the fol- 
lowing: (1) the 3 most desirable techniques 
of foremen are be fair to everyone, go to bat 
for workers, and give clear-cut instructions; 
(2) there is little relationship between what 
workers feel is desirable in foremen and their 
descriptions of their own foremen; (3) there 
is little relationship between what workers 
and foremen feel is descriptive of foremen; 
(4) foremen and workers largely agree on 
what the desirable characteristics of a fore- 
man are; (5) traits or characteristics descrip- 
tive of foremen rated as successful by their 
superiors are also considered desirable by 
workers. 

There is no indication in the report of the 
audience for which it is intended. It ap- 
pears, however, that it was not intended for 
persons with technical background in that 
much of what should be included in a report 
for this group is lacking. Only gross rank- 
ings are presented, without averages or meas- 
ures of variability. Correlation coefficients 
are used without descriptions of the method 
of computation used. Interpretations of sta- 
tistical findings are also questionable in some 
cases. For example, in an item analysis in 
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which frequency with which a foreman was 
described by an item was correlated with suc- 
cess as indicated by superiors’ ratings, 12 cor- 
relations, ranging from .28 to .53, are pre- 
sented as evidence that these 12 items are 
“highly related” to success. The discussion 
sections of the report go beyond the data 
presented, although the author does point out 
that intensive interviewing of foremen was 
an additional source of information. 

Although this report presents objective evi- 
dence on desirable foreman characteristics, it 
is doubtful whether the author’s hope that 
the findings “. . . can be used to both clarify 
the objectives and to increase the effective- 
ness of foreman training programs” will be 
realized by persons who turn to this report 
with that same hope in mind. 

Theodore R. Lindbom 


Midland Cooperatives, Inc., 
Minneapolis, Minnesota 


Montagu, A. The natural superiority of 
women. New York: Macmillan, 1953. Pp. 
205. $3.50. 


The first question that confronts the re- 
viewer in evaluating this book is “Just why 
was it written?” The facts brought out here 
about sex differences have been available for 
some time to intelligent men and women like 
the readers of the Saturday Review for whom 
this presentation was first designed. The 
books of Amram Scheinfeld and Margaret 
Mead on the subject have been widely read. 

It seems that this particular work is a 
polemic rather than simply a popularization 
of scientific material. Ashley Montagu, like 
other writers of our time, is concerned about 
the state of our society. Many of our diffi- 
culties arise, he thinks, because of the em- 
phasis we place on aggressiveness and com- 
petition and our failure to promote loving- 
kindness and cooperation. ‘These policies he 
sees as a consequence of the long-continued 
subjection of women. The _ psychological 
characteristics in which they excel have been 
systematically devaluated and thus both men 
and women have failed to stress the values 
which alone can insure the survival of hu- 
manity. If we can become convinced that the 
female sex is biologically and psychologically 
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superior, that endurance and resistance to dis- 
ease are more important than muscular size 
and strength, and that emotional expressive- 
ness and social perceptiveness are more im- 
portant than aggressiveness and mechanical 
skill, we shall have taken the first step to- 
ward the new emphasis our times require. 
The necessity the author sees to make the 
case that women are in all ways superior is 
responsible for the book’s major defects. In 
the first place, he insists again and again that 
what he is about to say will be shocking to 
the reader. One reads on to encounter some 
rather innocuous fact such as the difference 
in survival rates or frequency of automobile 
accidents. Secondly, his argument often leads 
him into reasoning which has sometimes been 
unkindly labeled “feminine” logic. In Chap- 
ter 4, for example, he explains first that there 
is no relationship between brain weight and 
intelligence and then goes on to argue that 
women’s brains are actually larger than men’s 
in proportion to total body size. If brain 
weight is a matter of no importance, why in- 
sist on superiority with regard to it? Thirdly, 


the debating orientation produces a certain 


distortion in some of the facts themselves. 
On the topic of intelligence differences, for 
example, he devotes so much more space to 
listing all the kinds of evidence for superior 
verbal ability in girls than he does to sum- 
marizing the kinds of test material on which 
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boys excel, that his conclusion that girls do 
better than boys on intelligence tests, with a 
few insignificant exceptions, appears plausible. 
On page 121 he quotes Stoddard’s statement 
as to the impossibility of evaluating sex dif- 
ferences in intelligence using our present tests, 
but by this time he has already used the 
available data to support his argument for 
female superiority. (Incidentally this is one 
of the facts he expects to be “shocking” to 
us. ) 

Few of us would quarrel with Mr. Montagu’s 
desire to see more love and cooperation in our 
society or his conviction that good relation- 
ships between the sexes are vital. The ques- 
tion is how much such an approach as this 
contributes to these ends. On page 185 he 
asks, “Is it too much to hope that the claims 
herein made for the natural superiority of 
women will shake men out of their com- 
placent acceptance of the present position of 
the sexes?” My answer would be, “Yes, I 
am afraid it is too much to hope. People’s 
convictions are not that easily shaken.” But 
whatever it may be worth as argument, for 
serious students of differential psychology, 
the contribution made by this book to our 
factual knowledge of sex differences can 
safely be ignored. There is nothing new here. 

Leona E. Tyler 


University of Oregon 
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